Maroua Mehri
University of La Rochelle
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maroua Mehri.
international conference on document analysis and recognition | 2015
Jean-Christophe Burie; Joseph Chazalon; Mickaël Coustaty; Sébastien Eskenazi; Muhammad Muzzamil Luqman; Maroua Mehri; Nibal Nayef; Jean-Marc Ogier; Sophea Prum; Marçal Rusiñol
Smartphones are enabling new ways of capture, hence arises the need for seamless and reliable acquisition and digitization of documents, in order to convert them to editable, searchable and a more human-readable format. Current state-of-the-art works lack databases and baseline benchmarks for digitizing mobile captured documents. We have organized a competition for mobile document capture and OCR in order to address this issue. The competition is structured into two independent challenges: smartphone document capture, and smartphone OCR. This report describes the datasets for both challenges along with their ground truth, details the performance evaluation protocols which we used, and presents the final results of the participating methods. In total, we received 13 submissions: 8 for challenge-1, and 5 for challenge-2.
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013
Maroua Mehri; Petra Gomez-Krämer; Pierre Héroux; Alain Boucher; Rémy Mullot
Texture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmentation has become a logical and relevant choice in the conditions of significant document image degradation and in the context of lacking information on the document structure such as the document model and the typographical parameters. However, previous work in the use of texture analysis for segmentation of digitized historical document images has been limited to separately test one of the well-known texture-based approaches such as autocorrelation function, Grey Level Co-occurrence Matrix (GLCM), Gabor filters, gradient, wavelets, etc. In this paper we raise the question of which texture-based method could be better suited for discriminating on the one hand graphical regions from textual ones and on the other hand for separating textual regions with different sizes and fonts. The objective of this paper is to compare some of the well-known texture-based approaches: autocorrelation function, GLCM, and Gabor filters, used in a segmentation of digitized historical document images. Texture features are briefly described and quantitative results are obtained on simplified historical document images. The achieved results are very encouraging.
international conference on document analysis and recognition | 2013
Maroua Mehri; Pierre Héroux; Petra Gomez-Krämer; Alain Boucher; Rémy Mullot
In the context of historical collection conservation and worldwide diffusion, this paper presents an automatic approach of historical book page layout segmentation. In this article, we propose to search the homogeneous regions from the content of historical digitized books with little a priori knowledge by extracting and analyzing texture features. The novelty of this work lies in the unsupervised clustering of the extracted texture descriptors to find homogeneous regions, i.e. graphic and textual regions, by performing the clustering approach on an entire book instead of processing each page individually. We propose firstly to characterize the content of an entire book by extracting the texture information of each page, as our goal is to compare and index the content of digitized books. The extraction of texture features, computed without any hypothesis on the document structure, is based on two non-parametric tools: the autocorrelation function and multiresolution analysis. Secondly, we perform an unsupervised clustering approach on the extracted features in order to classify automatically the homogeneous regions of book pages. The clustering results are assessed by internal and external accuracy measures. The overall results are quite satisfying. Such analysis would help to construct a computer-aided categorization tool of pages.
International Journal on Document Analysis and Recognition | 2017
Maroua Mehri; Pierre Héroux; Petra Gomez-Krämer; Rémy Mullot
The use of different texture-based methods is pervasive in different subfields and tasks of document image analysis (DIA) and particularly in historical DIA (HDIA). Nevertheless, faced with a large diversity of texture-based methods used for HDIA, few questions arise. Which texture methods are firstly well suited for segmenting graphical contents from textual ones, discriminating various text fonts and scales, and separating different types of graphics? Then, which texture-based method represents a constructive compromise between the performance and the computational cost? Thus, in this article a benchmarking of the most classical and widely used texture-based feature sets has been conducted using a classical texture-based pixel-labeling scheme on a large corpus of historical documents to have satisfactory and clear answers to the above questions. We focus on determining the performance of each texture-based feature set according to the document content. The results reported in this study provide firstly a qualitative measure of which texture-based feature sets are the most appropriate and secondly a useful benchmark in terms of performance and computational cost for current and future research efforts in HDIA.
Pattern Analysis and Applications | 2017
Maroua Mehri; Petra Gomez-Krämer; Pierre Héroux; Alain Boucher; Rémy Mullot
Over the last few years, there has been tremendous growth in the automatic processing of digitized historical documents. In fact, finding reliable systems for the interpretation of ancient documents has been a topic of major interest for many libraries and the prime issue of research in the document analysis community. One important challenge is to refine well-known approaches based on strong a priori knowledge (e.g., the document image content, layout, typography, font size and type, scanning resolution, image size, etc.). Nevertheless, a texture analysis approach has consistently been chosen to segment a page layout when information is lacking on document structure and content. Thus, in this article, a framework is proposed to investigate the use of texture as a tool for automatically determining homogeneous regions in a digitized historical book and segmenting its contents by extracting and analyzing texture features independently of the layout of the pages. The proposed framework is parameter free and applicable to a large variety of ancient of books. It does not assume a priori information regarding document image content and structure. It consists of two phases: a texture-based feature extraction step and unsupervised clustering and labeling task based on the consensus clustering, hierarchical ascendant classification, and nearest neighbor search algorithms. The novelty of this work lies in the clustering of extracted texture descriptors to find automatically homogeneous regions, i.e., graphic and textual regions, using the clustering approach on an entire book instead of processing each page individually. Our framework has been evaluated on a large variety of historical books and achieved promising results.
international conference on pattern recognition | 2014
Maroua Mehri; Mohamed Mhiri; Pierre Héroux; Petra Gomez-Krämer; Mohamed Ali Mahjoub; Rémy Mullot
Recently, texture-based features have been used for digitized historical document image segmentation. It has been proven that these methods work effectively with no a priori knowledge. Moreover, it has been shown that they are robust when they are applied on degraded documents under different noise levels and types. In this paper an approach of evaluating texture-based feature sets for segmenting historical documents is presented in order to compare them. We aim at determining which texture features could be more adequate for segmenting graphical regions from textual ones on the one hand and for discriminating text in a variety of situations of different fonts and scales on the other hand. For this purpose, six well-known and widely used texture-based feature sets (autocorrelation function, Grey Level Co occurrence Matrix, Gabor filters, 3-level Haar wavelet transform, 3-level wavelet transform using 3-tap Daubechies filter and 3-level wavelet transform using 4-tap Daubechies filter) are evaluated and compared on a large corpus of historical documents. An additional insight into the computation time and complexity of each texture-based feature set is given. Qualitative and numerical experiments are also given to demonstrate each texture-based feature set performance.
document analysis systems | 2014
Maroua Mehri; Van Cuong Kieu; Mohamed Mhiri; Pierre Héroux; Petra Gomez-Krämer; Mohamed Ali Mahjoub; Rémy Mullot
For the segmentation of ancient digitized document images, it has been shown that texture feature analysis is a consistent choice for meeting the need to segment a page layout under significant and various degradations. In addition, it has been proven that the texture-based approaches work effectively without hypothesis on the document structure, neither on the document model nor the typographical parameters. Thus, by investigating the use of texture as a tool for automatically segmenting images, we propose to search homogeneous and similar content regions by analyzing texture features based on a multiresolution analysis. The preliminary results show the effectiveness of the texture features extracted from the autocorrelation function, the Grey Level Co-occurrence Matrix (GLCM), and the Gabor filters. In order to assess the robustness of the proposed texture-based approaches, images under numerous degradation models are generated and two image enhancement algorithms (non-local means filtering and superpixel techniques) are evaluated by several accuracy metrics. This study shows the robustness of texture feature extraction for segmentation in the case of noise and the uselessness of a demising step.
document recognition and retrieval | 2015
Maroua Mehri; Nabil Sliti; Pierre Héroux; Petra Gomez-Krämer; Najoua Essoukri Ben Amara; Rémy Mullot
Designing reliable and fast segmentation algorithms of ancient documents has been a topic of major interest for many libraries and the prime issue of research in the document analysis community. Thus, we propose in this article a fast ancient document enhancement and segmentation algorithm based on using Simple Linear Iterative Clustering (SLIC) superpixels and Gabor descriptors in a multi-scale approach. Firstly, in order to obtain enhanced backgrounds of noisy ancient documents, a novel foreground/background segmentation algorithm based on SLIC superpixels, is introduced. Once, the SLIC technique is carried out, the background and foreground superpixels are classified. Then, an enhanced and non-noisy background is achieved after processing the background superpixels. Subsequently, Gabor descriptors are only extracted from the selected foreground superpixels of the enhanced gray-level ancient book document images by adopting a multi-scale approach. Finally, for ancient document image segmentation, a foreground superpixel clustering task is performed by partitioning Gabor-based feature sets into compact and well-separated clusters in the feature space. The proposed algorithm does not assume any a priori information regarding document image content and structure and provides interesting results on a large corpus of ancient documents. Qualitative and numerical experiments are given to demonstrate the enhancement and segmentation quality.
Proceedings of the 3rd International Workshop on Historical Document Imaging and Processing | 2015
Maroua Mehri; Nibal Nayef; Pierre Héroux; Petra Gomez-Krämer; Rémy Mullot
Many challenges and open issues related to the tremendous growth in digitizing collections of cultural heritage documents have been raised, such as information retrieval in digital libraries or analyzing page content of historical books. Recently, graphic/text segmentation in historical documents has posed specific challenges due to many particularities of historical document images (e.g. noise and degradation, presence of handwriting, overlapping layouts, great variability of page layout). To cope with those challenges, a method based on learning texture features for historical document image enhancement and segmentation is proposed in this article. The proposed method is based on using the simple linear iterative clustering (SLIC) superpixels, Gabor descriptors and support vector machines (SVM). It has been evaluated on 100 document images which have been selected from the databases of the competitions (i.e. historical document layout analysis and historical book recognition) in the context of ICDAR conference and HIP workshop (2011 and 2013). To demonstrate the enhancement and segmentation quality, the evaluation is based on manually labeled ground truth and shows the effectiveness of the proposed method through qualitative and numerical experiments. The proposed method provides interesting results on historical document images having various page layouts and different typographical and graphical properties.
international conference on pattern recognition applications and methods | 2014
Maroua Mehri; Petra Gomez-Krämer; Pierre Héroux; Alain Boucher; Rémy Mullot
As our goal is to compare and index the content of ancient digitized books, we propose to analyze different extracted texture features in order to segment homogeneous regions from the content of ancient digitized books with little a priori knowledge. To this end, we present and evaluate in this paper a complete framework for the comparative analysis of texture features for the segmentation and characterization of ancient book pages. Firstly, we characterize the content of an entire book by extracting the texture attributes of each page. The extraction of the texture features is based on a multiresolution analysis. Secondly, we perform an unsupervised clustering approach in order to classify automatically the homogeneous regions of book pages. Namely, we compare two approaches based on two different statistical categories of texture features, autocorrelation and co-occurrence, in order to segment the content of ancient book pages. By computing several clustering and classification accuracy measures, the results of the comparison show the effectiveness of the proposed framework. Tests on different book contents (text vs. graphics, manuscript vs. printed) show that those texture features are more suitable to distinguish textual regions from graphical ones, than to distinguish text fonts.