Sébastien Eskenazi
University of La Rochelle
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sébastien Eskenazi.
international conference on document analysis and recognition | 2015
Jean-Christophe Burie; Joseph Chazalon; Mickaël Coustaty; Sébastien Eskenazi; Muhammad Muzzamil Luqman; Maroua Mehri; Nibal Nayef; Jean-Marc Ogier; Sophea Prum; Marçal Rusiñol
Smartphones are enabling new ways of capture, hence arises the need for seamless and reliable acquisition and digitization of documents, in order to convert them to editable, searchable and a more human-readable format. Current state-of-the-art works lack databases and baseline benchmarks for digitizing mobile captured documents. We have organized a competition for mobile document capture and OCR in order to address this issue. The competition is structured into two independent challenges: smartphone document capture, and smartphone OCR. This report describes the datasets for both challenges along with their ground truth, details the performance evaluation protocols which we used, and presents the final results of the participating methods. In total, we received 13 submissions: 8 for challenge-1, and 5 for challenge-2.
Pattern Recognition | 2017
Sébastien Eskenazi; Petra Gomez-Krämer; Jean-Marc Ogier
Abstract In document image analysis, segmentation is the task that identifies the regions of a document. The increasing number of applications of document analysis requires a good knowledge of the available technologies. This survey highlights the variety of the approaches that have been proposed for document image segmentation since 2008. It provides a clear typology of documents and of document image segmentation algorithms. We also discuss the technical limitations of these algorithms, the way they are evaluated and the general trends of the community.
international conference on document analysis and recognition | 2015
Nibal Nayef; Muhammad Muzzamil Luqman; Sophea Prum; Sébastien Eskenazi; Joseph Chazalon; Jean-Marc Ogier
Smartphones are enabling new ways of capture, hence arises the need for seamless and reliable acquisition and digitization of documents. The quality assessment step is an important part of both the acquisition and the digitization processes. Assessing document quality could aid users during the capture process or help improve image enhancement methods after a document has been captured. Current state-of-the-art works lack databases in the field of document image quality assessment. In order to provide a baseline benchmark for quality assessment methods for mobile captured documents, we present in this paper a dataset for quality assessment that contains both singly- and multiply-distorted document images. The proposed dataset could be used for benchmarking quality assessment methods by the objective measure of OCR accuracy, and could be also used to benchmark quality enhancement methods. There are three types of documents in the dataset: modern documents, old administrative letters and receipts. The document images of the dataset are captured under varying capture conditions (light, different types of blur and perspective angles). This causes geometric and photometric distortions that hinder the OCR process. The ground truth of the dataset images consists of the text transcriptions of the documents, the OCR results of the captured documents and the values of the different capture parameters used for each image. We also present how the dataset could be used for evaluation in the field of no-reference quality assessment. The dataset is freely and publicly available for use by the research community at http://navidomass.univ-lr.fr/SmartDoc-QA.
international workshop on computational forensics | 2015
Sébastien Eskenazi; Petra Gomez-Krämer; Jean-Marc Ogier
It is very easy to ensure the authenticity of a digital document or of a paper document. However this security is seriously weakened when this document crosses the border between the material and the digital world. This paper presents the beginning of our work towards the creation of a document signature that would solve this security issue. Our primary finding is that current state of the art document analysis algorithms need to be re-evaluated under the criterion of robustness as we have done for OCR processing.
document engineering | 2015
Sébastien Eskenazi; Petra Gomez-Krämer; Jean-Marc Ogier
Security applications related to document authentication require an exact match between an authentic copy and the original of a document. This implies that the documents analysis algorithms that are used to compare two documents (original and copy) should provide the same output. This kind of algorithm includes the computation of layout descriptors from the segmentation result, as the layout of a document is a part of its semantic content. To this end, this paper presents a new layout descriptor that significantly improves the state of the art. The basic of this descriptor is the use of a Delaunay triangulation of the centroids of the document regions. This triangulation is seen as a graph and the adjacency matrix of the graph forms the descriptor. While most layout descriptors have a stability of 0% with regard to an exact match, our descriptor has a stability of 74% which can be brought up to 100% with the use of an appropriate matching algorithm. It also achieves 100% accuracy and retrieval in a document retrieval scheme on a database of 960 document images. Furthermore, this descriptor is extremely efficient as it performs a search in constant time with respect to the size of the document database and it reduces the size of the index of the database by a factor 400.
ieee international conference on fuzzy systems | 2016
Ngoc Bich Dao; Sébastien Eskenazi; Karell Bertet; Arnaud Revel
The ability to obtain a compact description of an object and to keep the significant information can be crucial for many applications such as indexing, clustering and classification. Formal concept analysis (FCA) provides an algorithm to perform such dimension reduction. However, the requirements of FCA limit its performance. In order to relax FCA requirements, this paper presents an extension of FCA with the definition and formal analysis of a fuzzy precedence graph. This is completed by a fuzzy dimension reduction algorithm. We have tested our algorithm on several real datasets related to bag of visual words and evaluated it for a classification task. The proposed formalism and algorithm improve FCA tools on several cases while retaining the significant information. Furthermore, the fuzzy dimension reduction algorithm never performs worse than FCA tools in terms of dimension reduction. We have also compared it with PCA dimension reduction. PCA usually reduces more attributes but looses more information and does so in an unreliable/unstable manner compared to our algorithm.
document analysis systems | 2016
Sébastien Eskenazi; Petra Gomez-Krämer; Jean-Marc Ogier
The importance of having stable information extraction algorithms for security related applications and more generally for industrial use cases has been recently highlighted. Stability is what makes an algorithm reliable as it gives a guarantee that the results will be reproducible on similar data. Without it, security criteria such as the probability of false positives cannot be quantified. As a consequence, no security application can be built from an unstable algorithm. In a document verification framework, the probability of false positives indicates the probability that two different results are given for two copies of the same document. This paper builds on our previous work about a stable layout descriptor to study the stability of four segmentation algorithms. We consider that a segmentation algorithm is stable if it produces the same layout for all copies of the same document. The algorithms studied are two versions of PAL, Voronoi, and JSEG. We compare the stability of the different algorithms and study the factors influencing their stability.
international conference on document analysis and recognition | 2015
Sébastien Eskenazi; Petra Gomez-Krämer; Jean-Marc Ogier
Current security applications rely on the performances of the algorithms that they use. For document authentication, document analysis algorithms should be precise enough to detect any modification. They should also be stable enough so that a document and its photocopy yield the same result. This requirement is an absolute stability. Having close values is not enough. They need to be exactly the same. This paper presents our preliminary work on the case of a stable layout descriptor. While everyone knows that thresholds are a source of instability, they are still common practice. We describe a promising layout descriptor which drastically reduces the number of thresholds compared to the state of the art. Unfortunately, it is not stable enough when tested on real data. There are still too many thresholds. This paper opens and justifies the path towards algorithms without any threshold.
international conference on document analysis and recognition | 2015
Jean-Christophe Burie; Joseph Chazalon; Mickaël Coustaty; Sébastien Eskenazi; Muhammad Muzzamil Luqman; Mehri Maroua; Nibal Nayef; Jean-Marc Ogier; Sophea Prum; Marçal Rusiñol
international conference on document analysis and recognition | 2017
Sébastien Eskenazi; Boris Bodin; Petra Gomez-Krämer; Jean-Marc Ogier