Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Karim Hadjar is active.

Publication


Featured researches published by Karim Hadjar.


international conference on document analysis and recognition | 2003

Arabic newspaper page segmentation

Karim Hadjar; Rolf Ingold

The aim of layout analysis is to extract the geometricstructure from a document image. It consists of labelinghomogenous regions of a document image. This paperdescribes the performance of segmentation algorithmsand their adaptation in order to treat complex structuredArabic documents such as newspapers. Experimentaltests have been carried out on four different phases ofnewspaper image analysis: thread recognition, framerecognition, image text separation, text line recognition,and line merging into blocks. Some promisingexperimental results are reported.


international conference on document analysis and recognition | 2005

Towards a canonical and structured representation of PDF documents through reverse engineering

Maurizio Rigamonti; Jean-Luc Bloechle; Karim Hadjar; Denis Lalanne; Rolf Ingold

This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-of-the-art document analysis techniques and outputs the layout structure in a hierarchical canonical form, i.e. which is universal and independent of the document type. This article first reviews the major traps and tricks of the PDF format. It then introduces the architecture of Xed along with its main modules, and, in particular, the document physical structure extraction algorithm. Later on, a canonical format is proposed and discussed with an example. Finally the results of a practical evaluation are presented, followed by an outline of future works on the logical structure extraction.


international conference on document analysis and recognition | 2001

Newspaper page decomposition using a split and merge approach

Karim Hadjar; Oliver Hitz; Rolf Ingold

Indexing large newspaper archives requires automatic page decomposition algorithms with high accuracy. In this paper, we present our approach to an automatic page decomposition algorithm developed for the First International Newspaper Segmentation Contest. Our approach decomposes the newspaper image into image regions, horizontal and vertical lines, text regions and title areas. Experimental results are obtained from the data set of the contest.


document analysis systems | 2006

XCDF: a canonical and structured document format

Jean-Luc Bloechle; Maurizio Rigamonti; Karim Hadjar; Denis Lalanne; Rolf Ingold

Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.


document analysis systems | 2002

Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM)

Karim Hadjar; Oliver Hitz; Lyse Robadey; Rolf Ingold

This paper describes 2(CREM), a recognition method to be applied on documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by a model, which contains a static as well as a dynamic part and evolves by use. The first prototype of 2(CREM) has been tested on four different phases of newspaper image analysis: line segment recognition, frame recognition, line merging into blocks, and logical labeling. Some promising experimental results are reported.


document analysis systems | 2004

Physical Layout Analysis of Complex Structured Arabic Documents Using Artificial Neural Nets

Karim Hadjar; Rolf Ingold

This paper describes PLANET, a recognition method to be applied on Arabic documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by artificial neural nets each one being specialized in a document model. The first prototype of PLANET has been tested on five different phases of newspaper image analysis: thread recognition, frame recognition, image text separation, text line recognition and line merging into blocks. The learning capability has been tested on line merging into blocks. Some promising experimental results are reported.


international conference on document analysis and recognition | 2005

Logical labeling of Arabic newspapers using artificial neural nets

Karim Hadjar; Rolf Ingold

Logical structure analysis is an important phase in the process of document image understanding. In this paper we propose a learning-based method to label logical components on Arabic newspaper documents. The labeling is driven by artificial neural nets. Each one is specialized in a document class. The first prototype of LUNET has been tested on a set of Arabic newspapers of three document classes. Some promising experimental results are reported.


Archive | 2016

University Ontology: A Case Study at Ahlia University

Karim Hadjar

The huge amount of information available on the Internet (and intranets) and their unstructured nature are reaching a point that some actions have to be taken in order to ease the use of queries within a web search engine. The introduction of order/organization and structure is necessary for the process of this information. One step toward this goal is the use of ontologies for specific areas/domains. The word ontology is becoming widespread, and its use in organizing the web is gaining momentum. Many scientists are working on semantic webs, which are considered as intelligent and meaningful webs, but the lack of university ontology made the author to develop one. A case study was developed to validate the ontology at Ahlia University, Bahrain. The results are presented.


international conference on document analysis and recognition | 2011

Minimizing User Annotations in the Generation of Layout Ground-Truthed Data

Karim Hadjar; Rolf Ingold

This paper describes the adaptation of a previously developed document recognition framework called PLANET (Physical Layout Analysis of complex structured Arabic documents using artificial neural NETs) into a ground truthing system for complex Arabic document images [8]. PLANET is a layout analysis tool for Arabic documents with complex structures allowing incremental learning in an interactive environment. Artificial neural nets drive the classification of homogeneous text blocks. We have observed that when users use PLANET for ground truthing, the number of interactive corrections is quite large. In order to reduce user intervention and to make use of PLANET as a ground truthing system we have adapted its architecture.


document analysis systems | 2010

Improving XED for extracting content from Arabic PDFs

Karim Hadjar; Rolf Ingold

PDF documents are widely used but the extraction and the manipulation and of their structured content is not an easy task. It requires sophisticated pre-processing and reverse engineering techniques to get such achievements. In this paper, we present an improvement of XED in order to handle unresolved issues related to the analysis of Arabic documents. A set of rules were proposed and implemented to enhance the extraction of Arabic content, by taking care of the different Arabic fonts, through mapping the un-interpreted Unicode values to the other interpreted sets as well as applying a reverse algorithm whenever needed. We finally expose concrete evaluations for the improvement of XED.

Collaboration


Dive into the Karim Hadjar's collaboration.

Top Co-Authors

Avatar

Rolf Ingold

University of Fribourg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oliver Hitz

University of Fribourg

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge