Abderrazak Zahour
University of Le Havre
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abderrazak Zahour.
International Journal on Document Analysis and Recognition | 2007
Laurence Likforman-Sulem; Abderrazak Zahour; Bruno Taconet
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines), automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade and dedicated to documents of historical interest.
Pattern Recognition Letters | 2010
Sami Ben Moussa; Abderrazak Zahour; Abdellatif Benabdelhafid; Adel M. Alimi
In this work, a new method is proposed to the widely neglected problem of Arabic font recognition, it uses global texture analysis. This method is based on fractal geometry, and the feature extraction does not depend on the document contents. In our method, we take the document as an image containing some specific textures and regard font recognition as texture identification. We have combined both techniques BCD (box counting dimension) and DCD (dilation counting dimension) to obtain the main features. The first expresses texture distribution in 2-D image. The second makes possible to take on the human vision system aspect, since it makes it possible to differentiate one font from another. Both features are expressed in a parametric form; then four features were kept. Experiments are carried out by using 1000 samples of 10 typefaces (each typeface is combined with four sizes). The average recognition rates are of about 96.2% using KNN (K nearest neighbor) and 98% using RBF (radial basic function). Experimental results are also included in the robustness of the method against written size, skew, image degradation (e.g., Gaussian noise) and resolution, and compared with the existing methods. The main advantages of our method are that (1) the dimension of feature vector is very low; (2) the variation sizes of the studied blocks (which are not standardized) are robust; (3) less samples are needed to train the classifier; (4) finally and the most important, is the first attempt to apply and adapt fractal dimensions to font recognition.
international conference on document analysis and recognition | 2007
Abderrazak Zahour; Laurence Likforman-Sulem; W. Boussalaa; Bruno Taconet
This paper presents a text line segmentation method for printed or handwritten historical Arabic documents. Documents are first classified into 2 classes using a K-means scheme. These classes correspond to document complexity (easy or not easy to segment). Then, a document which includes overlapping and touching characters, is divided into vertical strips. The extracted text blocks obtained by horizontal projection are classified into three categories: small, average and large text blocks. After segmenting the large text blocks, the lines are obtained by matching adjacent blocks within two successive strips using spatial relationship. The document without overlapping or touching characters is segmented by making abstraction on the segmentation module of the large text blocks. The text line segmentation method has a 96% accuracy on a collection of 100 historical documents
acm symposium on applied computing | 2007
Wafa Boussellaa; Abderrazak Zahour; Adel M. Alimi
This paper presents a new color document image segmentation system suitable for historical Arabic manuscripts. Our system is composed of a hybrid method which couple together background light intensity normalization algorithm and k-means clustering with maximum likelihood (ML) estimation, for foreground/background separation. Firstly, the background normalization algorithm performs separation between foreground and background. This foreground is used in later steps. Secondly, our algorithm proceeds on luminance and distort the contrast. These distortions are corrected with a gamma correction and contrast adjustment. Finally, the new enhanced foreground image is segmented to foreground/background on the basis of ML estimation. The initial parameters for the ML method are estimated by k-means clustering algorithm. The segmented image is used to produce a final restored document image. The techniques are tested on a set of Arabic historical manuscripts documents from the National Tunisian Library. The performance of the algorithm is demonstrated on by real color manuscripts distorted with show-through effects, uneven background color and localized spot.
Pattern Analysis and Applications | 2009
Abderrazak Zahour; Brunco Taconet; Laurence Likforman-Sulem; Wafa Boussellaa
This paper presents a new approach for text-line segmentation based on Block Covering which solves the problem of overlapping and multi-touching components. Block Covering is the core of a system which processes a set of ancient Arabic documents from historical archives. The system is designed for separating text-lines even if they are overlapping and multi-touching. We exploit the Block Covering technique in three steps: a new fractal analysis (Block Counting) for document classification, a statistical analysis of block heights for block classification and a neighboring analysis for building text-lines. The Block Counting fractal analysis, associated with a fuzzy C-means scheme, is performed on document images in order to classify them according to their complexity: tightly (closely) spaced documents (TSD) or widely spaced documents (WSD). An optimal Block Covering is applied on TSD documents which include overlapping and multi-touching lines. The large blocks generated by the covering are then segmented by relying on the statistical analysis of block heights. The final labeling into text-lines is based on a block neighboring analysis. Experimental results provided on images of the Tunisian Historical Archives reveal the feasibility of the Block Covering technique for segmenting ancient Arabic documents.
international conference on pattern recognition | 2008
S. Ben Moussa; Abderrazak Zahour; Abdellatif Benabdelhafid; Adel M. Alimi
In this paper, we present multilingual automatic identification of Arabic and Latin in both handwritten and printed script. The proposed scheme is based, Firstly, on morphological transform of line text images, secondly on fractal analysis features of both (i): original texture of 2-D images, (ii): vertical and horizontal profile projection. We used two techniques to obtain only 12 features based on fractal multi-dimension. The proposed system has been tested for 1000 prototypes with various typefaces, scriptors styles and sizes. The accuracy discrimination rate is about of 96.64 % by using KNN, and 98.72 % by using RBF. Experimental results show the importance of the proposed approach.
international conference on pattern recognition | 2010
Wafa Boussellaa; Abderrazak Zahour; Haikal Elabed; Abdellatif Benabdelhafid; Adel M. Alimi
This paper presents a new method for automatic text-line extraction from Arabic historical handwritten documents presenting an overlapping and multi-touching characters problems. Our approach is based on block covering analysis using unsupervised technique. This algorithm performs firstly a statistical block analysis which computes the optimal number of document decomposition into vertical strips. Then, our algorithm achieves a fuzzy base line detection using fuzzy Cmeans algorithm. Finally, blocks are assigned to its corresponding lines. Experiment results show that the proposed method achieves high accuracy about 95% for detecting text lines in Arabic historical handwritten document images written with different scripts.
international conference on document analysis and recognition | 2009
Wafa Boussellaa; Aymen Bougacha; Abderrazak Zahour; Haikal El Abed; Adel M. Alimi
This paper presents a new enhanced text extraction algorithm from degraded document images on the basis of the probabilistic models. The observed document image is considered as a mixture of Gaussian densities which represents the foreground and background document image components. The EM algorithm is introduced in order to estimate and improve the parameters of the mixtures of densities recursively. The initial parameters of the EM algorithm are estimated by the k-means clustering method. After the parameter estimation, the document image is partitioned into text and background classes by the means of ML approach. The performance of the proposed approach is evaluated on a variety of degraded documents comes from the collections of the National library of Tunisia.
document analysis systems | 1998
Saddok Kebairi; Bruno Taconet; Abderrazak Zahour; Saïd Ramdane
In this paper, we present a method to classify forms by a statistical approach; the physical structure may vary from one writer to another. An automatic form segmentation is performed to extract the physical structure which is described by the main rectangular block set. During the form learning phase, a block matching is made inside each class; the number of occurrences of each block is counted, and statistical block attributes are computed. During the phase of identification, we solve the block instability by introducing a block penalty coefficient, which modifies the classical expression of Mahalanobis distance. A block penalty coefficient depends on the block occurrence probability. Experimental results, using the different form types, are given.
international conference on document analysis and recognition | 2007
Wafa Boussellaa; Abderrazak Zahour; Bruno Taconet; Adel M. Alimi; Abdellatif Benabdelhafid
This paper presents the new system PRAAD for preprocessing and analysis of Arabic historical documents. It is composed of two important parts: pre-processing and analysis of ancient documents. After digitization, the color or greyscale ancient documents images are distorted by the presence of strong background artefacts such as scan optical blur and noise, show-through and bleed-through effects and spots. In order to preserve and exploit this cultural heritage documents, we intend to create efficient tool that achieves restoration, binarisation, and analyses the document layout. The developed tool is done by adapting our expertise in document image processing of Arabic ancient documents, printed or manuscripts. The different functions of PRAAD system are tested on a set of Arabic ancient documents from the national library and the National Archives of Tunisia.