Karim Hadjar
University of Fribourg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karim Hadjar.
international conference on document analysis and recognition | 2003
Karim Hadjar; Rolf Ingold
The aim of layout analysis is to extract the geometricstructure from a document image. It consists of labelinghomogenous regions of a document image. This paperdescribes the performance of segmentation algorithmsand their adaptation in order to treat complex structuredArabic documents such as newspapers. Experimentaltests have been carried out on four different phases ofnewspaper image analysis: thread recognition, framerecognition, image text separation, text line recognition,and line merging into blocks. Some promisingexperimental results are reported.
international conference on document analysis and recognition | 2005
Maurizio Rigamonti; Jean-Luc Bloechle; Karim Hadjar; Denis Lalanne; Rolf Ingold
This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-of-the-art document analysis techniques and outputs the layout structure in a hierarchical canonical form, i.e. which is universal and independent of the document type. This article first reviews the major traps and tricks of the PDF format. It then introduces the architecture of Xed along with its main modules, and, in particular, the document physical structure extraction algorithm. Later on, a canonical format is proposed and discussed with an example. Finally the results of a practical evaluation are presented, followed by an outline of future works on the logical structure extraction.
international conference on document analysis and recognition | 2001
Karim Hadjar; Oliver Hitz; Rolf Ingold
Indexing large newspaper archives requires automatic page decomposition algorithms with high accuracy. In this paper, we present our approach to an automatic page decomposition algorithm developed for the First International Newspaper Segmentation Contest. Our approach decomposes the newspaper image into image regions, horizontal and vertical lines, text regions and title areas. Experimental results are obtained from the data set of the contest.
document analysis systems | 2006
Jean-Luc Bloechle; Maurizio Rigamonti; Karim Hadjar; Denis Lalanne; Rolf Ingold
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.
document analysis systems | 2002
Karim Hadjar; Oliver Hitz; Lyse Robadey; Rolf Ingold
This paper describes 2(CREM), a recognition method to be applied on documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by a model, which contains a static as well as a dynamic part and evolves by use. The first prototype of 2(CREM) has been tested on four different phases of newspaper image analysis: line segment recognition, frame recognition, line merging into blocks, and logical labeling. Some promising experimental results are reported.
document analysis systems | 2004
Karim Hadjar; Rolf Ingold
This paper describes PLANET, a recognition method to be applied on Arabic documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by artificial neural nets each one being specialized in a document model. The first prototype of PLANET has been tested on five different phases of newspaper image analysis: thread recognition, frame recognition, image text separation, text line recognition and line merging into blocks. The learning capability has been tested on line merging into blocks. Some promising experimental results are reported.
international conference on document analysis and recognition | 2005
Karim Hadjar; Rolf Ingold
Logical structure analysis is an important phase in the process of document image understanding. In this paper we propose a learning-based method to label logical components on Arabic newspaper documents. The labeling is driven by artificial neural nets. Each one is specialized in a document class. The first prototype of LUNET has been tested on a set of Arabic newspapers of three document classes. Some promising experimental results are reported.
Archive | 2016
Karim Hadjar
The huge amount of information available on the Internet (and intranets) and their unstructured nature are reaching a point that some actions have to be taken in order to ease the use of queries within a web search engine. The introduction of order/organization and structure is necessary for the process of this information. One step toward this goal is the use of ontologies for specific areas/domains. The word ontology is becoming widespread, and its use in organizing the web is gaining momentum. Many scientists are working on semantic webs, which are considered as intelligent and meaningful webs, but the lack of university ontology made the author to develop one. A case study was developed to validate the ontology at Ahlia University, Bahrain. The results are presented.
international conference on document analysis and recognition | 2011
Karim Hadjar; Rolf Ingold
This paper describes the adaptation of a previously developed document recognition framework called PLANET (Physical Layout Analysis of complex structured Arabic documents using artificial neural NETs) into a ground truthing system for complex Arabic document images [8]. PLANET is a layout analysis tool for Arabic documents with complex structures allowing incremental learning in an interactive environment. Artificial neural nets drive the classification of homogeneous text blocks. We have observed that when users use PLANET for ground truthing, the number of interactive corrections is quite large. In order to reduce user intervention and to make use of PLANET as a ground truthing system we have adapted its architecture.
document analysis systems | 2010
Karim Hadjar; Rolf Ingold
PDF documents are widely used but the extraction and the manipulation and of their structured content is not an easy task. It requires sophisticated pre-processing and reverse engineering techniques to get such achievements. In this paper, we present an improvement of XED in order to handle unresolved issues related to the analysis of Arabic documents. A set of rules were proposed and implemented to enhance the extraction of Arabic content, by taking care of the different Arabic fonts, through mapping the un-interpreted Unicode values to the other interpreted sets as well as applying a reverse algorithm whenever needed. We finally expose concrete evaluations for the improvement of XED.