Pierre Héroux
University of Rouen
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Pierre Héroux.
international conference on pattern recognition | 1998
Pierre Héroux; Sébastien Diana; Arnaud Ribert; Eric Trupin
We present three classifiers used in automatic forms class identification. The first category of classifier includes the k-nearest neighbours (kNN) and the multilayer perceptron (MLP) classifiers. The second category corresponds to a new structural classifier based on tree comparison. The low level information based on a pyramidal decomposition of the document image is used by the kNN and the MLP classifiers, while the high level information represents the form content with a hierarchical structure used by the new structural classifier. Experimental results are presented. Some strategies of classifier co-operation are proposed.
Pattern Recognition | 2012
Pierre Le Bodic; Pierre Héroux; Sébastien Adam; Yves Lecourtier
This paper tackles the problem of substitution-tolerant subgraph isomorphism which is a specific class of error-tolerant isomorphism. This problem aims at finding a subgraph isomorphism of a pattern graph S in a target graph G. This isomorphism only considers label substitutions and forbids vertex and edge insertion in G. This kind of subgraph isomorphism is often needed in pattern recognition problems when graphs are attributed with real values and no exact matching can be found between attributes due to noise. Our proposal to solve the problem of substitution-tolerant subgraph isomorphism relies on its formulation in the Integer Linear Program (ILP) formalism. Using a general ILP solver, the approach is able to find, if one exists, a mapping of a pattern graph into a target graph such that the topology of the searched graph is kept and the editing operations between the labels have a minimal cost. This technique is evaluated on both a set of synthetic graphs and a problem of symbol detection in technical drawings. In the second case, document and symbol images are represented by vector-attributed Region Adjacency Graphs built from a segmentation process. Obtained results demonstrate the relevance of considering subgraph isomorphism as an optimization process.
international conference on document analysis and recognition | 2009
Nicolas Sidère; Pierre Héroux; Jean-Yves Ramel
In this article we present a new approach for the classification of structured data using graphs. We suggest to solve the problem of complexity in measuring the distance between graphs by using a new graph signature. We present an extension of the vector representation based on pattern frequency, which integrates labeling information. In this paper, we compare the results achieved on public graph databases for the classification of symbols and letters using this graph signature with those obtained using the graph edit distance.
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013
Maroua Mehri; Petra Gomez-Krämer; Pierre Héroux; Alain Boucher; Rémy Mullot
Texture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmentation has become a logical and relevant choice in the conditions of significant document image degradation and in the context of lacking information on the document structure such as the document model and the typographical parameters. However, previous work in the use of texture analysis for segmentation of digitized historical document images has been limited to separately test one of the well-known texture-based approaches such as autocorrelation function, Grey Level Co-occurrence Matrix (GLCM), Gabor filters, gradient, wavelets, etc. In this paper we raise the question of which texture-based method could be better suited for discriminating on the one hand graphical regions from textual ones and on the other hand for separating textual regions with different sizes and fonts. The objective of this paper is to compare some of the well-known texture-based approaches: autocorrelation function, GLCM, and Gabor filters, used in a segmentation of digitized historical document images. Texture features are briefly described and quantitative results are obtained on simplified historical document images. The achieved results are very encouraging.
Computer Vision and Image Understanding | 2011
Romain Raveaux; Sébastien Adam; Pierre Héroux; íric Trupin
This paper presents some new approaches for computing graph prototypes in the context of the design of a structural nearest prototype classifier. Four kinds of prototypes are investigated and compared: set median graphs, generalized median graphs, set discriminative graphs and generalized discriminative graphs. They differ according to (i) the graph space where they are searched for and (ii) the objective function which is used for their computation. The first criterion allows to distinguish set prototypes which are selected in the initial graph training set from generalized prototypes which are generated in an infinite set of graphs. The second criterion allows to distinguish median graphs which minimize the sum of distances to all input graphs of a given class from discriminative graphs, which are computed using classification performance as criterion, taking into account the inter-class distribution. For each kind of prototype, the proposed approach allows to identify one or many prototypes per class, in order to manage the trade-off between the classification accuracy and the classification time. Each graph prototype generation/selection is performed through a genetic algorithm which can be specialized to each case by setting the appropriate encoding scheme, fitness and genetic operators. An experimental study performed on several graph databases shows the superiority of the generation approach over the selection one. On the other hand, discriminative prototypes outperform the generative ones. Moreover, we show that the classification rates are improved while the number of prototypes increases. Finally, we show that discriminative prototypes give better results than the median graph based classifier.
international conference on document analysis and recognition | 2007
Pierre Héroux; Eugen Barbu; Sébastien Adam; Eric Trupin
Performance evaluation for document image analysis and understanding is a recurring problem. Many ground- truthed document image databases are now used to evaluate algorithms, but these databases are less useful for the design of a complete system in a precise context. This paper proposes an approach for the automatic generation of synthesised document images and associated ground-truth information based on a derivation of publishing tools. An implementation of this approach illustrates the richness of the produced information.
international conference on document analysis and recognition | 2009
Pierre Le Bodic; Hervé Locteau; Sébastien Adam; Pierre Héroux; Yves Lecourtier; Arnaud Knippel
In this paper, we tackle the problem of localizing graphical symbols on complex technical document images by using an original approach to solve the subgraph isomorphism problem. In the proposed system, document and symbol images are represented by vector-attributed Region Adjacency Graphs (RAG) which are extracted by a segmentation process and feature extractors. Vertices representing regions are labeled with shape descriptors whereas edges are labeled with feature vector representing topological relations between the regions. Then, in order to search the instances of a model graph describing a particular symbol in a large graph corresponding to a whole document, we model the subgraph isomorphism problem as an Integer Linear Program (ILP) which enables to be error-tolerant on vectorial labels. The problem is then solved using a free efficient solver called SYMPHONY. The whole system is evaluated on a set of synthetic documents.
graphics recognition | 2005
Eugen Barbu; Pierre Héroux; Sébastien Adam; Eric Trupin
A database is only usefull if it is associated a set of procedures allowing to retrieve relevant elements for the users’ needs. A lot of IR techniques have been developed for automatic indexing and retrieval in document databases. Most of these use indexes depending on the textual content of documents, and very few are able to handle graphical or image content without human annotation. This paper describes an approach similar to the bag of words technique for automatic indexing of graphical document image databases and different ways to consequently query these databases. In an unsupervised manner, this approach proposes a set of automatically discovered symbols that can be combined with logical operators to build queries.
document recognition and retrieval | 2013
Maroua Mehri; Petra Gomez-Krämer; Pierre Héroux; Rémy Mullot
Recent progress in the digitization of heterogeneous collections of ancient documents has rekindled new challenges in information retrieval in digital libraries and document layout analysis. Therefore, in order to control the quality of historical document image digitization and to meet the need of a characterization of their content using intermediate level metadata (between image and document structure), we propose a fast automatic layout segmentation of old document images based on five descriptors. Those descriptors, based on the autocorrelation function, are obtained by multiresolution analysis and used afterwards in a specific clustering method. The method proposed in this article has the advantage that it is performed without any hypothesis on the document structure, either about the document model (physical structure), or the typographical parameters (logical structure). It is also parameter-free since it automatically adapts to the image content. In this paper, firstly, we detail our proposal to characterize the content of old documents by extracting the autocorrelation features in the different areas of a page and at several resolutions. Then, we show that is possible to automatically find the homogeneous regions defined by similar indices of autocorrelation without knowledge about the number of clusters using adapted hierarchical ascendant classification and consensus clustering approaches. To assess our method, we apply our algorithm on 316 old document images, which encompass six centuries (1200-1900) of French history, in order to demonstrate the performance of our proposal in terms of segmentation and characterization of heterogeneous corpus content. Moreover, we define a new evaluation metric, the homogeneity measure, which aims at evaluating the segmentation and characterization accuracy of our methodology. We find a 85% of mean homogeneity accuracy. Those results help to represent a document by a hierarchy of layout structure and content, and to define one or more signatures for each page, on the basis of a hierarchical representation of homogeneous blocks and their topology.
GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition | 2007
Romain Raveaux; Barbu Eugen; Hervé Locteau; Sébastien Adam; Pierre Héroux; Eric Trupin
In this paper, a graph classification approach based on a multi-objective genetic algorithm is presented. The method consists in the learning of sets composed of synthetic graph prototypes which are used for a classification step. These learning graphs are generated by simultaneously maximizing the recognition rate while minimizing the confusion rate. Using such an approach the algorithm provides a range of solutions, the couples (confusion, recognition) which suit to the needs of the system. Experiments are performed on real data sets, representing 10 symbols. These tests demonstrate the interest to produce prototypes instead of finding representatives which simply belong to the data set.