Stefan Klink
German Research Centre for Artificial Intelligence
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Stefan Klink.
International Journal on Document Analysis and Recognition | 2001
Stefan Klink; Thomas Kieninger
Abstract. Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document ‘analysis’ and ‘understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules can be formulated based on features which might be observed within one specific layout object. However, rules can also express dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (e.g., lists).
document analysis systems | 2002
Stefan Klink; Armin Hust; Markus Junker; Andreas Dengel
Query expansion methods have been studied for a long time - with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional search terms. In contrast to other methods, the regarded query is expanded by adding those terms which are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that this kind of query expansion results in notable improvements of the retrieval effectiveness if measured the recall/precision in comparison to the standard vector space model and to the pseudo relevance feedback. This approach can be used to improve the retrieval of documents in Digital Libraries, in Document Management Systems, in the WWW etc.
european conference on machine learning | 2002
Stefan Klink; Armin Hust; Markus Junker; Andreas Dengel
Information Retrieval Systems have been studied in Computer Science for decades. The traditional ad-hoc task is to find all documents relevant for an ad-hoc given query but the accuracy of ad-hoc document retrieval systems has plateaued in recent years. At DFKI, we are working on so-called collaborative information retrieval (CIR) systems which unintrusively learn from their users search processes. In this paper, a new approach is presented called term-based concept learning (TCL) which learns conceptual description terms occurring in known queries. A new query is expanded term by term using the previously learned concepts. Experiments have shown that TCL and the combination with pseudo relevance feedback result in notable improvements in the retrieval effectiveness if measured the recall/precision in comparison to the standard vector space model and to the pseudo relevance feedback. This approach can be used to improve the retrieval of documents in Digital Libraries, in Document Management Systems, in the WWW etc.
international conference on document analysis and recognition | 1999
Stefan Klink; Thorsten Jäger
In this paper we present a comprehensive voting approach, taking entire layouts obtained from commercial OCR devices as input. Such a layout comprises segments of three kinds: lines, words, and characters. By combining all attributes of a segment (e.g. recognized text, font height etc.), we attain a better layout, representing the original page layout as good as possible. The voting process itself is hierarchically organized, starting with the line segments. For each level, a search tree is spawn and all fellow segments (segments front different layouts which denote the same image area) are established. A heuristic search method is utilized which is guided by a similarity measure defined on segments. Deviations in the segmentation, as well as segmentation errors of individual commercial OCR devices, are compensated by an equalization module.
document analysis systems | 2000
Stefan Klink; Andreas Dengel; Thomas Kieninger
Text Mining | 2003
Armin Hust; Stefan Klink; Markus Junker; Andreas Dengel
Archive | 2001
Stefan Klink
Archive | 2002
Armin Hust; Stefan Klink; Markus Junker; Andreas Dengel
Archive | 2002
Stefan Klink; Armin Hust; Markus Junker
Archive | 2002
Markus Junker; Armin Hust; Stefan Klink