Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bertin Klein is active.

Publication


Featured researches published by Bertin Klein.


international conference on document analysis and recognition | 2001

Three approaches to "industrial" table spotting

Bertin Klein; Serdar Gökkus; Thomas Kieninger; Andreas Dengel

This paper introduces three approaches for an industrial, comprehensive document analysis system to enable it to spot tables in documents. Searching for a set of known table headers (approach 1) works rather well in a significant number of documents. But this approach (though it is implemented tolerant to OCR errors) is not tolerant enough towards some kinds of even minor aberrations. This not only decreases the recognition results, but also, even worse, makes users feel uncomfortable. Pragmatically trying to mimic for what the human eyes might key, leads to our two further, complementary approaches: searching for layout structures which resemble parts of columns (approach 2), and searching for groupings of similar lines (approach 3). The suitability of the approaches for our system requires them to be very simple to implement and simple to explain to users, computationally cheap, and combinable. In the domain of health insurances who receive huge amounts of so called medical liquidations on a daily basis we obtain very good results. On document samples representative for the every day practice of five customers-health insurance companies-tables were spotted as good and as fast as the customers expected the system to be. We thus consider our current approaches as a step towards cognitive adequacy.


document analysis systems | 2004

Results of a Study on Invoice-Reading Systems in Germany

Bertin Klein; Stevan Agne; Andreas Dengel

Companies order, receive, and pay for goods. Hence they continually receive and process invoices. For the most part these are printed on paper and are dealt with manually, so that each invoice after receipt involves processing costs of about 9 Euro on average. Often, human searching and typing of data into computer forms is required to transfer the information from paper into the computer, e.g. into ERP-systems, like SAP, that many companies run. This article presents the main results of our 300-page market survey of 11 suppliers of invoice reading systems (\(\mathcal{I}\)-\(\mathcal{R}\)-\(\mathcal{S}\)), which automate the transfer of invoice data to ERP-systems. For the scientific \(\mathcal{I}\)-\(\mathcal{R}\)-\(\mathcal{S}\) community we hope to provide the service of a better visibility of our discipline to potential investors and users.


International Journal on Digital Libraries | 1997

Error tolerant document structure analysis

Bertin Klein; Peter Fankhauser

Abstract. Successful applications of digital libraries require structured access to sources of information. This paper presents an approach to extract the logical structure of text documents. The extracted structure is explicated by means of SGML (Standard Generalized Markup Language). Consequently, the extraction is achieved on the basis of grammars that extend SGML with recognition rules. From these grammars parsing automata are generated. These automata are used to partition a flat text document into its elements, to discard formatting information, and to insert SGML markups. Complex document structures and fallback rules needed for error tolerant parsing make such automata highly ambiguous. A novel parsing strategy has been developed that ranks and prunes ambiguous parsing paths.


document analysis systems | 2002

smartFIX: A Requirements-Driven System for Document Analysis and Understanding

Andreas Dengel; Bertin Klein

Although the internet offers a wide-spread platform for information interchange, day-to-day work in large companies still means the processing of tens of thousands of printed documents every day. This paper presents the system smartFIX which is a document analysis and understanding system developed by the DFKI spin-off INSIDERS. It permits the processing of documents ranging from fixed format forms to unstructured letters of any format. Apart from the architecture, the main components and system characteristics, we also show some results when applying smartFIX to medical bills and prescriptions.


conference on information visualization | 2006

Dynamic Visualization and Navigation of Semantic Virtual Environments

Katja Einsfeld; Stefan Agne; Matthias Deller; Achim Ebert; Bertin Klein; Christian Reuschling

Although information visualization claims to provide the means to induce mental models of any kind of data, the visualization of semantic information is still an open field of research. Existing approaches either concentrate on the visualization of documents without additional metadata or produce unintuitive expert graphics. This paper seeks to fill this gap by presenting a semantic information visualization system with a dynamic 3D interface and intuitive metaphors. The application called DocuWorld visualizes documents, document meta-data, and semantic relations between documents. The general visualization and navigation metaphor called Thought Wizard Metaphor allows user- and context-sensitive adaption of visualization modes and visualization environments


Lecture Notes in Computer Science | 2004

smartFIX : An Adaptive System for Document Analysis and Understanding

Bertin Klein; Andreas Dengel; Andreas Fordan

The internet is certainly a wide-spread platform for information interchange today and the semantic web actually seems to become more and more real. However, day-to-day work in companies still necessitates the laborious, manual processing of huge amounts of printed documents. This article presents the system smartFIX, a document analysis and understanding system developed by the DFKI spin-off insiders. During the research project “adaptive Read”, funded by the German ministry for research, BMBF, smartFIX was fundamentally developed to a higher maturity level, with a focus on adaptivity. The system is able to extract information from documents – documents ranging from fixed format forms to unstructured letters of many formats. Apart from the architecture, the main components and the system characteristics, we also show some results from the application of smartFIX to representative samples of medical bills and prescriptions.


Proceedings of the 1st ACM international workshop on Human-centered multimedia | 2006

Human-centered interaction with documents

Andreas Dengel; Stefan Agne; Bertin Klein; Achim Ebert; Matthias Deller

In this paper, we discuss a new user interface, a complementary environment for the work with personal document archives, i.e. for document filing and retrieval. We introduce our implementation of a spatial medium for document interaction, explorative search and active navigation, which exploits and further stimulates the human strengths of visual information processing. Our system achieves a high degree of immersion of the user, so that he/she forgets the artificiality of his/her environment. This is done by means of a tripartite ensemble of allowing users to interact naturally with gestures and postures (as an option gestures and postures can be individually taught to the system by users), exploiting 3D technology, and supporting the user to maintain structures he/she discovers, as well as provide computer calculated semantic structures. Our ongoing evaluation shows that even non-expert users can efficiently work with the information in a document collection, and have fun.


International Journal on Document Analysis and Recognition | 2003

Problem-adaptable document analysis and understanding for high-volume applications

Bertin Klein; R. Dengel

Abstract.Although the Internet is increasingly emerging as “the” widespread platform for information interchange, day-to-day work in companies still necessitates the laborious, manual processing of huge amounts of printed documents. This article presents the system smartFIX, a document analysis and understanding system developed by the DFKI spin-off insiders technologies. It enables the automatic processing of documents ranging from fixed format forms to unstructured letters of any format. In addition to the architecture, main components, and system characteristics, we also show some results from the application of smartFIX to medical bills and prescriptions.


document analysis systems | 2006

On benchmarking of invoice analysis systems

Bertin Klein; Stefan Agne; Andreas Dengel

An approach is presented to guide the benchmarking of invoice analysis systems, a specific, applied subclass of document analysis systems. The state of the art of benchmarking of document analysis systems is presented, based on the processing levels: Document Page Segmentation, Text Recognition, Document Classification, and Information Extraction. The restriction to invoices enables and requires a more purposeful, i.e. detailed, targetting of the benchmarking procedures (acquisition of ground truth data, system runs, comparison of data, condensation into meaningful numbers). Therefore the processing of invoices is dissected. The involved data structures are elicited and presented. These are provided, being the building blocks of the actual benchmarking of invoice analysis systems.


international conference on document analysis and recognition | 2003

Evaluating SEE: a benchmarking system for document page segmentation

Stefan Agne; Andreas Dengel; Bertin Klein

The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and improvement of page segmentation algorithms is systematic evaluation. The approaches known from the literature have the disadvantage that manually generated reference data (zoning ground truth) are needed for the evaluation task. The effort and cost of the creation of these data are very high. This paper describes the evaluation system SEE and presents an assessment of its quality. The system requires the OCR generated text and the original text of the document in correct reading order (text ground truth) as input. No manually generated zoning ground truth is needed. The implicit structure information that is contained in the text ground truth is used for the evaluation of the automatic zoning. Therefore, an assignment of the corresponding text regions in the text ground truth and those in the OCR generated text (matches) is sought. A fault tolerant string matching algorithm underlies a method, able to tolerate OCR errors in the text. The segmentation errors are determined as a result of the evaluation of the matching. Subsequently, the edit operations which are necessary for the correction of the recognized segmentation errors are computed to estimate the correction costs. Furthermore, SEE provides a version of the OCR generated text, which is corrected from the detected page segmentation errors.

Collaboration


Dive into the Bertin Klein's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Achim Ebert

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Andreas Dengel

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Andreas Abecker

Forschungszentrum Informatik

View shared research outputs
Top Co-Authors

Avatar

Michael Bender

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christian Höcht

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar

Elisabeth Wolf

Technische Universität Darmstadt

View shared research outputs
Top Co-Authors

Avatar

Hans Hagen

Kaiserslautern University of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge