Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markus Diem is active.

Publication


Featured researches published by Markus Diem.


international conference on document analysis and recognition | 2013

CVL-DataBase: An Off-Line Database for Writer Retrieval, Writer Identification and Word Spotting

Florian Kleber; Stefan Fiel; Markus Diem; Robert Sablatnig

In this paper a public database for writer retrieval, writer identification and word spotting is presented. The CVL-Database consists of 7 different handwritten texts (1 German and 6 English Texts) and 311 different writers. For each text an RGB color image (300 dpi) comprising the handwritten text and the printed text sample are available as well as a cropped version (only handwritten). A unique ID identifies the writer, whereas the bounding boxes for each single word are stored in an XML file. An evaluation of the best algorithms of the ICDAR and ICHFR writer identification contest has been performed on the CVL-database.


international conference on document analysis and recognition | 2009

Recognition of Degraded Handwritten Characters Using Local Features

Markus Diem; Robert Sablatnig

The main problems of Optical Character Recognition (OCR) systems are solved if printed latin text is considered. Since OCR systems are based upon binary images, their results are poor if the text is degraded. In this paper a codex consisting of ancient manuscripts is investigated. Due to environmental effects the characters of the analyzed codex are washed out which leads to poor results gained by state of the art binarization methods. Hence, a segmentation free approach based on local descriptors is being developed. Regarding local information allows for recognizing characters that are only partially visible. In order to recognize a character the local descriptors are initially classified with a Support Vector Machine (SVM) and then identified by a voting scheme of neighboring local descriptors. State of the art local descriptor systems are evaluated in this paper in order to compare their performance for the recognition of degraded characters.


international conference on document analysis and recognition | 2011

Layout Analysis for Historical Manuscripts Using Sift Features

Angelika Garz; Robert Sablatnig; Markus Diem

We propose a layout analysis method for historical manuscripts that relies on the part-based identification of layout entities. A layout entity -- such as letters of the text, initials or headings -- is composed of a set of characteristic segments or structures, which is dissimilar for distinct classes in the manuscripts under consideration. This fact is exploited in order to segment a manuscript page into homogeneous regions. Historical documents traditionally involve challenges such as uneven writing support and varying shapes of characters, fluctuating text lines, changing scripts and writing styles, and variance in the layout itself. Hence, a part-based detection of layout entities is proposed using a multi-stage algorithm for the localization of the entities, based on interest points. Results show that the proposed method is able to locate initials, headings and text areas in ancient manuscripts containing stains, tears and partially faded-out ink sufficiently well.


international conference on document analysis and recognition | 2013

ICDAR 2013 Competition on Handwritten Digit Recognition (HDRC 2013)

Markus Diem; Stefan Fiel; Angelika Garz; Manuel Keglevic; Florian Kleber; Robert Sablatnig

This paper presents the results of the HDRC 2013 competition for recognition of handwritten digits organized in conjunction with ICDAR 2013. The general objective of this competition is to identify, evaluate and compare recent developments in character recognition and to introduce a new challenging dataset for benchmarking. We describe competition details including dataset and evaluation measures used, and give a comparative performance analysis of the nine (9) submitted methods along with a short description of the respective methodologies.


document analysis systems | 2014

End-to-End Text Recognition Using Local Ternary Patterns, MSER and Deep Convolutional Nets

Michael Opitz; Markus Diem; Stefan Fiel; Florian Kleber; Robert Sablatnig

Text recognition in natural scene images is an application for several computer vision applications like licence plate recognition, automated translation of street signs, help for visually impaired people or image retrieval. In this work an end-to-end text recognition system is presented. For detection an AdaBoost ensemble with a modified Local Ternary Pattern (LTP) feature-set with a post-processing stage build upon Maximally Stable Extremely Region (MSER) is used. The text recognition is done using a deep Convolution Neural Network (CNN) trained with backpropagation. The system presented outperforms state of the art methods on the ICDAR 2003 dataset in the text-detection (F-Score: 74.2%), dictionary-driven cropped-word recognition (F-Score: 87.1%) and dictionary-driven end-to-end recognition (F-Score: 72.6%) tasks.


document analysis systems | 2010

Document analysis applied to fragments: feature set for the reconstruction of torn documents

Markus Diem; Florian Kleber; Robert Sablatnig

Document analysis is done to analyze entire forms (e.g. intelligent form analysis, table detection) or to describe the layout/structure of a document. In this paper document analysis is applied to snippets of torn documents to calculate features that can be used for reconstruction. The main intention is to handle snippets of varying size and different contents (e.g. handwritten or printed text). Documents can either be destroyed by the intention to make the printed content unavailable (e.g. business crime) or due to time induced degeneration of ancient documents (e.g. bad storage conditions). Current reconstruction methods for manually torn documents deal with the shape, or e.g. inpainting and texture synthesis techniques. In this paper the potential of document analysis techniques of snippets to support a reconstruction algorithm by considering additional features is shown. This implies a rotational analysis, a color analysis, a line detection, a paper type analysis (checked, lined, blank) and a classification of the text (printed or hand written). Preliminary results show that these features can be determined reliably on a real dataset consisting of 690 snippets.


Proceedings of SPIE | 2010

Recognizing characters of ancient manuscripts

Markus Diem; Robert Sablatnig

Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.


international conference on document analysis and recognition | 2013

Text Line Detection for Heterogeneous Documents

Markus Diem; Florian Kleber; Robert Sablatnig

Text line detection is a pre-processing step for automated document analysis such as word spotting or OCR. It is additionally used for document structure analysis or layout analysis. Considering mixed layouts, degraded documents and handwritten documents, text line detection is still challenging. We present a novel approach that targets torn documents having varying layouts and writing. The proposed method is a bottom up approach that fuses words, to globally minimize their fusing distance. In order to improve processing time and further layout analysis, text lines are represented by oriented rectangles. Even though, the method was designed for modern handwritten and printed documents, tests on medieval manuscripts give promising results. Additionally, the text line detection was evaluated on the ICDAR 2009 and ICFHR 2010 Handwriting Segmentation Contest datasets.


international conference on frontiers in handwriting recognition | 2010

Are Characters Objects

Markus Diem; Robert Sablatnig

This paper presents a character recognition system that handles degraded manuscript documents like the ones discovered at the St. Catherine’s Monastery. In contrast to state-of-the-art OCR systems, no early decision (image binarization) needs to be performed. Thus, an object recognition methodology is adapted for the recognition of ancient manuscripts. The proposed system is based on local descriptors which are clustered in order to localize characters. Finally, a class probability histogram is assigned to each character present in an image which allows for the character classification. The system achieves an F0.5 score of 0.77 on real world data that contains 13.5% highly degraded characters.


international conference on frontiers in handwriting recognition | 2010

Detecting Text Areas and Decorative Elements in Ancient Manuscripts

Angelika Garz; Markus Diem; Robert Sablatnig

An approach for the detection of decorative elements – such as initials and headlines – and text regions, focused on ancient manuscripts, is presented. Due to their age, ancient manuscripts suffer from degradation and staining as well as ink is faded-out over the time. Identifying decorative elements and text regions allows indexing a manuscript and serves as input for Optical Character Recognition (OCR) as it localizes regions of interest within document pages. We propose a robust method inspired by state-of-the-art object recognition methodologies. Scale Invariant Feature Transform (SIFT) descriptors are chosen to detect the regions of interest, and the scale of the interest points is used for localization. The classification is based on the fact that local properties of the decorative elements are different to those of regular text. The results show that the method is able to locate regular text in ancient manuscripts. The detection rate of decorative elements is not as high as for regular text but already yields to promising results.

Collaboration


Dive into the Markus Diem's collaboration.

Top Co-Authors

Avatar

Robert Sablatnig

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Florian Kleber

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Stefan Fiel

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Fabian Hollaus

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Melanie Gau

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin Kampel

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Martin Lettner

Vienna University of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael Dworzak

Medical University of Vienna

View shared research outputs
Researchain Logo
Decentralizing Knowledge