Van Cuong Kieu
University of Bordeaux
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Van Cuong Kieu.
graphics recognition | 2013
Alicia Fornés; Van Cuong Kieu; Muriel Visani; Nicholas Journet; Anjan Dutta
The first competition on music scores that was organized at ICDAR and GREC in 2011 awoke the interest of researchers, who participated in both staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario concerning old and degraded music scores. For this purpose, we have generated a new set of semi-synthetic images using two degradation models that we previously introduced: local noise and 3D distortions. In this extended paper we provide an extended description of the dataset, degradation models, evaluation metrics, the participant’s methods and the obtained results that could not be presented at ICDAR and GREC proceedings due to page limitations.
international conference on document analysis and recognition | 2013
Muriel Visaniy; Van Cuong Kieu; Alicia Fornés; Nicholas Journet
The first competition on music scores that was organized at ICDAR in 2011 awoke the interest of researchers, who participated both at staff removal and writer identification tasks. In this second edition, we focus on the staff removal task and simulate a real case scenario: old music scores. For this purpose, we have generated a new set of images using two kinds of degradations: local noise and 3D distortions. This paper describes the dataset, distortion methods, evaluation metrics, the participants methods and the obtained results.
international conference on document analysis and recognition | 2013
Van Cuong Kieu; Nicholas Journet; Muriel Visani; Rémy Mullot; Jean-Philippe Domenger
This article presents a method for generating semi-synthetic images of old documents where the pages might be torn (not flat). By using only 2D deformation models, most existing methods give non-realistic synthetic document images. Thus, we propose to use 3D approach for reproducing geometric distortions in real documents. First, a new proposed texture coordinate generation technique extracts texture coordinates of each vertex in the document shape (mesh) resulting from 3D scanning of a real degraded document. Then, any 2D document image can be overlayed on the mesh by using an existing texture image mapping method. As a result, many complex real geometric distortions can be integrated in generated synthetic images. These images then can be used for enriching training sets or for performance evaluation. The degradation method here is jointly used with the character degradation model we proposed in [1] to generate the 6000 semi-synthetic degraded images of the music score removal staff line competition of ICDAR 2013.
document analysis systems | 2014
Maroua Mehri; Van Cuong Kieu; Mohamed Mhiri; Pierre Héroux; Petra Gomez-Krämer; Mohamed Ali Mahjoub; Rémy Mullot
For the segmentation of ancient digitized document images, it has been shown that texture feature analysis is a consistent choice for meeting the need to segment a page layout under significant and various degradations. In addition, it has been proven that the texture-based approaches work effectively without hypothesis on the document structure, neither on the document model nor the typographical parameters. Thus, by investigating the use of texture as a tool for automatically segmenting images, we propose to search homogeneous and similar content regions by analyzing texture features based on a multiresolution analysis. The preliminary results show the effectiveness of the texture features extracted from the autocorrelation function, the Grey Level Co-occurrence Matrix (GLCM), and the Gabor filters. In order to assess the robustness of the proposed texture-based approaches, images under numerous degradation models are generated and two image enhancement algorithms (non-local means filtering and superpixel techniques) are evaluated by several accuracy metrics. This study shows the robustness of texture feature extraction for segmentation in the case of noise and the uselessness of a demising step.
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013
Van Cuong Kieu; Muriel Visani; Nicholas Journet; Rémy Mullot; Jean-Philippe Domenger
This paper presents an efficient parametrization method for generating synthetic noise on document images. By specifying the desired categories and amount of noise, the method is able to generate synthetic document images with most of degradations observed in real document images (ink splotches, white specks or streaks). Thanks to the ability of simulating different amount and kind of noise, it is possible to evaluate the robustness of many document image analysis methods. It also permits to generate data for algorithms that employ a learning process. The degradation model presented in [7] needs eight parameters for generating randomly noise regions. We propose here an extension of this model which aims to set automatically the eight parameters to generate precisely what a user wants (amount and category of noise). Our proposition consists of three steps. First, Nsp seed-points (i.e. centres of noise regions) are selected by an adaptive procedure. Then, these seed-points are classified into three categories of noise by using a heuristic rule. Finally, each size of noise region is set using a random process in order to generate degradations as realistic as possible.
advanced concepts for intelligent vision systems | 2015
Van Cuong Kieu; Florence Cloppet; Nicole Vincent
The efficiency of document image processing techniques depends on image quality that is impaired by many sources of degradation. These sources can be in document itself or arise from the acquisition process. In this paper, we are concerned with blur degradation without any prior knowledge on the blur origin. We propose to evaluate the blur parameter at local level on predefined zones without relying on any blur model. This parameter is linked to a fuzzy statistical analysis of the textual part of the document extracted in the initial image. The proposed measure is evaluated on DIQA database where the correlation between blur degree and OCR accuracy is computed. The results show that our blur estimation can help to predict OCR accuracy.
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing | 2013
Andreas Fischer; Muriel Visani; Van Cuong Kieu; Ching Y. Suen
Historical documents pose challenging problems for training handwriting recognition systems. Besides the high variability of character shapes inherent to all handwriting, the image quality can also differ greatly, for instance due to faded ink, ink bleed-through, wrinkled and stained parchment. Especially when only few learning samples are available, it is difficult to incorporate this variability in the morphological character models. In this paper, we investigate the use of image degradation to generate synthetic learning samples for historical handwriting recognition. With respect to three image degradation models, we report significant improvements in accuracy for recognition with hidden Markov models on the medieval Saint Gall and Parzival data sets.
Pattern Recognition Letters | 2017
Van Cuong Kieu; Florence Cloppet; Nicole Vincent
A new local approach based on a clustering of blur and non-blur classes to deal with heterogeneous blur in document images.Non linear approach.A new blur feature for clustering blur and non-blur classes.The study on the impact of blur on OCR accuracy.The first comparison on standard databases. In this paper, we propose a local blur estimation for document images captured by portable cameras. A novel blur pixel feature is extracted from pixels properties in working zones to initialize a fuzzy clustering of blur and non-blur classes. At the final state of the process, a blur region is determined for each working zone. The blur score is given by the average of all membership values of pixels in the blur region. The quantitative evaluation on two real databases (DIQA and an industrial database) shows that our method achieves good results in comparison with recent methods estimated on these databases.
document analysis systems | 2016
Van Cuong Kieu; Florence Cloppet; Nicole Vincent
In this paper, we propose an OCR accuracy prediction method based on a local blur estimation since blur is one of the important factors that mostly damage OCR accuracy. First, we apply the blur estimation on synthetic blurred images by using Gaussian and motion blur in order to investigate the relation between blur effect and character size regarding OCR accuracy. This relation is considered as a blur-character size feature to define a classifier. Finally, the classifier can separate characters of a given document into three classes: readable, intermediate, and non-readable classes. Therefore, the quality score of the document is inferred from the three classes. The proposed method is evaluated on a published database and on an industrial one. The correlation with OCR accuracy is also given to compare with the state-of-the-art methods.
international conference on pattern recognition | 2016
Van Cuong Kieu; Florence Cloppet; Nicole Vincent
Image quality is a hot topic since most of image analysis and recognition systems are sensitive to degradation. In this paper, we are interested in the quality of document images to recover text content with as few errors as possible. One of the defects that degrades the Optical Character Recognition rate (OCR) is blur. Therefore, we propose to detect blur and to attenuate its effect on an OCR result. As blur is non-uniform on the document area, we propose a local approach. No prior model is chosen for blur that may be of various natures. Thanks to a local clustering based on a novel blur feature, we build a debluring adapted to the heterogeneous blur. As a result, the blurred image is locally corrected according to the blur type. This method focuses on the ease of character segmentation, then makes OCR more efficient. The experiments are carried out on a public database DIQA together with the use of an OCR. The obtained results show that the proposed method gives an overall improvement of 11% in OCR rate.