Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage | 2019

OCR-D: An end-to-end open source OCR framework for historical printed documents

 
 
 
 
 
 
 

Abstract


Various research projects were concerned with the development and adaptation of methods for OCR specifically for historical printed documents (cf. METAe [20], IMPACT [1], eMOP [9]). However, these initiatives have ended before the wide adoption of deep neural networks and, despite the various project s achievements, there remains a lack of OCR software that is a) comprehensive with regard to the challenges presented by the wide variety of historical documents and b) available as ready-to-use Free Software. The OCR-D project aims to rectify that. In this paper we introduce the background of OCR-D, the main challenges and shortcomings in the availability of open tools and resources for OCR of historical printed documents and discuss the various software modules and related components (repositories, workflows) that are being made available through OCR-D. Finally we provide an outlook to a number of remaining challenges that are not addressed by OCR-D and point out several examples for the positive community aspects arisen through the creation and sharing of open resources for historical German OCR.

Volume None
Pages None
DOI 10.1145/3322905.3322917
Language English
Journal Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage

Full Text