Is this you? Create Your Porfile

Abdel Belaïd

Centre national de la recherche scientifique

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abdel Belaïd is active.

Explore More

Publication

Featured researches published by Abdel Belaïd.

International Journal on Document Analysis and Recognition | 2001

Automatic extraction of printed mathematical formulas using fuzzy logic and propagation of context

A. Kacem; Abdel Belaïd; M. Ben Ahmed

Abstract. This paper describes a new method to segment printed mathematical documents precisely and extract formulas automatically from their images. Unlike classical methods, it is more directed towards segmentation rather than recognition, isolating mathematical formulas outside and inside text-lines. Our ultimate goal is to delimit parts of text that could disturb OCR applications, not yet trained for formula recognition and restructuring. The method is based on a global and a local segmentation. The global segmentation separates isolated formulas from the text lines using a primary labeling. The local segmentation propagates the context around the mathematical operators met to discard embedded formulas from plain text. The primary labeling identifies some mathematical symbols by models created at a learning step using fuzzy logic. The secondary labeling reinforces the results of the primary labeling and locates the subscripts and the superscripts inside the text. A heuristic has been defined that guides this automatic process. In this paper, the different modules making up the automated segmentation of mathematical document system are presented with examples of results. Experiments carried out on some commonly seen mathematical documents show that our proposed method can achieve quite satisfactory rates, making mathematical formula extraction more feasible for real-world applications. The average rate of primary labeling of mathematical operators is about 95.3% and their secondary labeling can improve the rate by about 4%. The formula extraction rate, evaluated with 300 formulas and 100 mathematical documents having variable complexity, is close to 93%

International Journal on Document Analysis and Recognition | 2001

Recognition of table of contents for electronic library consulting

Abdel Belaïd

Abstract. A labelling approach for the automatic recognition of tables of contents (ToC) is described in this paper. A prototype is used for the electronic consulting of scientific papers in a digital library system named Calliope. This method operates on a roughly structured ASCII file, produced by OCR. The recognition approach operates by text labelling without using any a priori model. Labelling is based on part-of-speech tagging (PoS) which is initiated by a primary labelling of text components using some specific dictionaries. Significant tags are first grouped into homogeneous classes according to their grammar categories and then reduced in canonical forms corresponding to article fields: “title” and “authors”. Non-labelled tokens are integrated in one or another field by either applying PoS correction rules or using a structure model generated from well-detected articles. The designed prototype operates very well on different ToC layouts and character recognition qualities. Without manual intervention, a 96.3% rate of correct segmentation was obtained on 38 journals, including 2,020 articles, accompanied by a 93.0% rate of correct field extraction.

international conference on document analysis and recognition | 1995

Item searching in forms: Application to French tax form

Y. Belaid; Abdel Belaïd; E. Turolla

Cell searching is an important step in form analysis. Information in a form is contained mainly inside its cells. The goal of this paper is to describe a robust method to locate the items whose boundaries are lines without using any a priori information about the form. Our method is based on the detection of lines by Hough transform and on searching of cycles, corresponding to cell location, in a graph. Thanks to Hough transform, our approach is robust, skew independent and can be applied to several kind of lines such as continuous, dashed, doubled, etc.

international conference on document analysis and recognition | 2005

Rejection strategy for convolutional neural network by adaptive topology applied to handwritten digits recognition

Hubert Cecotti; Abdel Belaïd

In this paper, we propose a rejection strategy for convolutional neural network models. The purpose of this work is to adapt the networks topology injunction of the geometrical error. A self-organizing map is used to change the links between the layers leading to a geometric image transformation occurring directly inside the network. Instead of learning all the possible deformation of a pattern, ambiguous patterns are rejected and the networks topology is modified in function of their geometric errors thanks to a specialized self-organizing map. Our objective is to show how an adaptive topology, without a new learning, can improve the recognition of rejected patterns in the case of handwritten digits.

international conference on document analysis and recognition | 2005

Hybrid OCR combination approach complemented by a specialized ICR applied on ancient documents

Hubert Cecotti; Abdel Belaïd

In spite of the improvement of commercial optical character recognition (OCR) during the last years, their ability to process different kinds of documents can also be a default. They cannot produce a perfect recognition for all documents. However they allow producing high result for standard cases. We propose in this paper a model combining several OCRs and a specialized ICR (intelligent character recognition) based on a convolutional neural network to complement them. Instead of just performing several OCRs in parallel and applying a fusing rule of the results, a specialized neural network with an adaptive topology is added to complement the OCRs in function of the OCRs errors. This system has been tested on ancient documents containing old characters and old fonts not used in contemporary documents. The OCRs combination increases the recognition of about 3% whereas the ICR improves the recognition of rejected characters of more than 5%.

international conference on document analysis and recognition | 1999

EXTRAFOR: automatic EXTRAction of mathematical FORmulas

Afef Kacem; Abdel Belaïd; M. Ben Ahmed

We present a method for automatic extraction of mathematical formulas from images of documents without character recognition. Formula extraction is first done by location of its most significant symbols, then extension to adjoining symbols using contextual rules until delimitation of the whole formula space. Mathematical symbol labelling is realised from models created at the learning stage using fuzzy logic. From the experiments, we found that the average rate of primary labelling of mathematical symbols is about 95.3%. The obtained results have demonstrated the applicability of our system since 90% of mathematical formulas are well extracted from documents printed with high quality.

graphics recognition | 1995

Form Item Extraction Based on Line Searching

Eric Turolla; Yolande Belaïd; Abdel Belaïd

This paper presents an item searching method which has been applied to various kinds of forms. This approach is based on line detection through the Hough transform. After obtaining the straight lines, Hough directions are used to detect the real segments in the image. Segments can correspond either to continuous line, or to black parts of dashed or dotted lines. So, the segments are grouped together and classified between both adjacent line crossing points. Items are located by searching the minimum cycles of the graph constructed from the line intersection points. The last step consists of verifying the line classes based on the homogeneity hypothesis of item sides.

International Journal on Document Analysis and Recognition | 1998

Retrospective Document Conversion: Application to the Library Domain

Abdel Belaïd

Abstract. This paper describes a framework for retrospective document conversion in the library domain. Drawing on the experience and insight gained from projects launched over the present decade by the European Commission, it outlines the requirements for solving the problem of retroconversion and traces the main phases of associated processing. To highlight the main problems encountered in this area, the paper also outlines studies conducted by our group in the more project for the retroconversion of old catalogues belonging to two different libraries: National French Library and Royal Belgian Library. For the French Library, the idea was to study the feasibility of a recognition approach avoiding the use of ocr and basing the strategy mainly on visual features. The challenge was to recognize a logical structure from its physical aspects. The modest results obtained from experiments for this first study led us, in the second study, to base the structural recognition methodology more on the logical aspects by focusing the analysis on the content. Furthermore, for the Belgian references, the aim was to convert reference catalogues into a more conventional unimarc format while respecting the industrial constraints. Without manual intervention, 75% rate of correct recognition was obtained on 11 catalogues containing about 4548 references.

document recognition and retrieval | 2009

Segmentation of continuous document flow by a modified backward-forward algorithm

Thomas Meilender; Abdel Belaïd

This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.

international conference on advances in pattern recognition | 2005

High performance classifiers combination for handwritten digit recognition

Hubert Cecotti; Szilárd Vajda; Abdel Belaïd

This paper presents a multi-classifier system using classifiers based on two different approaches. A stochastic model using Markov Random Field is combined with different kind of neural networks by several fusing rules. It has been proved that the combination of different classifiers can lead to improve the global recognition rate. We propose to compare different fusing rules in a framework composed of classifiers with high accuracies. We show that even there still remains a complementarity between classifiers, even from the same approach, that improves the global recognition rate. The combinations have been tested on handwritten digits. The overall recognition rate has reached 99.03% without using any rejection criteria.

Explore More