Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mickaël Coustaty is active.

Publication


Featured researches published by Mickaël Coustaty.


acm ieee joint conference on digital libraries | 2017

Impact of OCR errors on the use of digital libraries: towards a better access to information

Guillaume Chiron; Antoine Doucet; Mickaël Coustaty; Muriel Visani; Jean-Philippe Moreux

Digital collections are increasingly used for a variety of purposes. In Europe only, we can conservatively estimate that tens of thousands of users consult digital libraries daily. The usages are often motivated by qualitative and quantitative research. However, caution must be advised as most digitized documents are indexed through their OCRed version, which is far from perfect, especially for ancient documents. In this paper, we aim to estimate the impact of OCR errors on the use of a major online platform: The Gallica digital library from the National Library of France. It accounts for more than 100M OCRed documents and receives 80M search queries every year. In this context, we introduce two main contributions. First, an original corpus of OCRed documents composed of 12M characters along with the corresponding gold standard is presented and provided, with an equal share of English- and French-written documents. Next, statistics on OCR errors have been computed thanks to a novel alignment method introduced in this paper. Making use of all the user queries submitted to the Gallica portal over 4 months, we take advantage of our error model to propose an indicator for predicting the relative risk that queried terms mismatch targeted resources due to OCR errors, underlining the critical extent to which OCR quality impacts on digital library access.


Pattern Recognition | 2017

Fuzzy generalized median graphs computation: Application to content-based document retrieval

Ramzi Chaieb; Karim Kalti; Muhammad Muzzamil Luqman; Mickaël Coustaty; Jean-Marc Ogier; Najoua Essoukri Ben Amara

Abstract Fuzzy median graph is an important new concept that can represent a set of fuzzy graphs by a representative fuzzy graph prototype. However, the computation of a fuzzy median graph remains a computationally expensive task. In this paper, we propose a new approximate algorithm for the computation of the Fuzzy Generalized Median Graph (FGMG) based on Fuzzy Attributed Relational Graph (FARG) embedding in a suitable vector space in order to capture the maximum information in graphs and to improve the accuracy and speed of document image retrieval processing. In this study, we focus on the application of FGMGs to the Content-based Document Retrieval (CBDR) problem. Experiments on real and synthetic databases containing a large number of FARGs with large sizes show that a CBDR using the FGMG as a dataset representative yields better results than an exhaustive and sequential retrieval in terms of gains in accuracy and time processing.


The New Review of Hypermedia and Multimedia | 2018

SentiML ++: an extension of the SentiML sentiment annotation scheme

Malik M. Saad Missen; Mickaël Coustaty; Nadeem Salamat; V. B. Surya Prasath

ABSTRACT The amount of opinionated data on the web has exponentially increased especially after the emergence of online social networks. To deal with these huge deluge of data, we need to have robust mechanisms that can help identify all aspects of opinion segment and support the automatic processing of opinion data. Recently, there have been a few developments made in this direction, and different sentiment annotation schemes have been proposed such as the SentiML, OpinionMiningML, and EmotionML. In this work, we propose SentiML++, an extension of SentiML with a focus on annotating opinions, and further answering aspects of the general question “who has what opinion about whom in which context?”. A detailed comparison with SentiML and other existing annotation schemes is also presented. The data collection annotated with SentiML has been annotated with SentiML++ and is available for download for further research purposes. Experiments with data collections annotated with SentiML and SentiML++ proves that SentiML++ is a significant and valuable addition to SentiML.


Pattern Recognition Letters | 2018

New spatial-organization-based scale and rotation invariant features for heterogeneous-content camera-based document image retrieval

Quoc Bao Dang; Mickaël Coustaty; Muhammad Muzzamil Luqman; Jean-Marc Ogier; Cao De Tran

Abstract In this paper, we extend our earlier proposed feature descriptor named Scale and Rotation Invariant Features (SRIF) and a camera-based heterogeneous-content information spotting system based on the latter. Through its capacity to manage heterogeneous content in document images, SRIF represents an extension to existing strategies such as LLAH, which are dedicated to textual document images. This paper proposes new extensions of SRIF based on geometrical constraints between pairs of nearest points around a keypoint. SRIF has built-in capabilities to deal with feature point extraction errors which are introduced in camera-captured documents. To validate our method and compare it to the state-of-the-art, we have constructed three datasets of heterogeneous-content document images, along with the corresponding ground truths. Our experiment results confirm that SRIF outperforms the state-of-the-art in terms of processing time with equal or greater recall and precision for retrieval and spotting results.


international conference on emerging security technologies | 2017

A dataset for forgery detection and spotting in document images

Nicolas Sidere; Francisco Cruz; Mickaël Coustaty; Jean-Marc Ogier

In the last decades, the explosion of the volume of digital document images, and the development of consumer tools to modify these images, has lead to a huge increase on reported fraudulent document cases. This situation has promoted the development of automatic methods for both preventing forgeries in modified documents and detecting them. However, document forensics is a sensitive topic. Data is usually either private or unlabeled, and most of the reported works are commonly evaluated on datasets with a restricted access. In this paper we present a new public dataset made of a corpus of 477 corrupted payslips in which near 6000 characters were forged. Provided with a reliable groundtruth, we expect this dataset to be useful for many works in the digital forensics research domain.


acm ieee joint conference on digital libraries | 2017

Touchdoc: a tool to bridge the gap between physical and digital libraries

Nicolas Sidere; Cyrille Suire; Mickaël Coustaty; Joseph Chazalon; Jean-Christophe Burie; Jean-Marc Ogier

In this paper, we explore the concept of augmented document and present a new user experience to digitize a document, modify its layout and edit its content by designing speci c interfaces on multi-touch devices and using advanced techniques in document analysis. This framework exploits image processing tools to facilitate manipulations that are natural considering paper documents and complex in their digital versions. In addition, we open discussions on bridging the gap between physical and digital libraries by improving user experience with the use of this platform.


international conference on document analysis and recognition | 2017

Enhancing Table of Contents Extraction by System Aggregation

Thi-Tuyet-Hai Nguyen; Antoine Doucet; Mickaël Coustaty


international conference on document analysis and recognition | 2017

Local Enlacement Histograms for Historical Drop Caps Style Recognition

Michaël Clément; Mickaël Coustaty; Camille Kurtz; Laurent Wendling


document analysis systems | 2018

Feature Selection for Document Flow Segmentation

Ahmed Hamdi; Mickaël Coustaty; Aurelie Joseph; Vincent Poulain d'Andecy; Antoine Doucet; Jean-Marc Ogier


international conference on document analysis and recognition | 2017

ICDAR2017 Competition on Post-OCR Text Correction

Guillaume Chiron; Antoine Doucet; Mickaël Coustaty; Jean-Philippe Moreux

Collaboration


Dive into the Mickaël Coustaty's collaboration.

Top Co-Authors

Avatar

Jean-Marc Ogier

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar

Antoine Doucet

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar

Joseph Chazalon

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Muriel Visani

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar

Nicolas Sidere

University of La Rochelle

View shared research outputs
Top Co-Authors

Avatar

Nicolas Sidère

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Stéphane Bres

Institut national des sciences Appliquées de Lyon

View shared research outputs
Researchain Logo
Decentralizing Knowledge