Gabriel de França Pereira e Silva

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gabriel de França Pereira e Silva is active.

Explore More

Publication

Featured researches published by Gabriel de França Pereira e Silva.

Expert Systems With Applications | 2013

Assessing sentence scoring techniques for extractive text summarization

Rafael Ferreira; Luciano de Souza Cabral; Rafael Dueire Lins; Gabriel de França Pereira e Silva; Fred Freitas; George D. C. Cavalcanti; Rinaldo Lima; Steven J. Simske; Luciano Favaro

Abstract Text summarization is the process of automatically creating a shorter version of one or more text documents. It is an important way of finding relevant information in large text libraries or in the Internet. Essentially, text summarization techniques are classified as Extractive and Abstractive. Extractive techniques perform text summarization by selecting sentences of documents according to some criteria. Abstractive summaries attempt to improve the coherence among sentences by eliminating redundancies and clarifying the contest of sentences. In terms of extractive summarization, sentence scoring is the technique most used for extractive text summarization. This paper describes and performs a quantitative and qualitative assessment of 15 algorithms for sentence scoring available in the literature. Three different datasets (News, Blogs and Article contexts) were evaluated. In addition, directions to improve the sentence extraction results obtained are suggested.

Expert Systems With Applications | 2014

A multi-document summarization system based on statistics and linguistic treatment

Rafael Ferreira; Luciano de Souza Cabral; Frederico Luiz Gonçalves de Freitas; Rafael Dueire Lins; Gabriel de França Pereira e Silva; Steven J. Simske; Luciano Favaro

The massive quantity of data available today in the Internet has reached such a huge volume that it has become humanly unfeasible to efficiently sieve useful information from it. One solution to this problem is offered by using text summarization techniques. Text summarization, the process of automatically creating a shorter version of one or more text documents, is an important way of finding relevant information in large text libraries or in the Internet. This paper presents a multi-document summarization system that concisely extracts the main aspects of a set of documents, trying to avoid the typical problems of this type of summarization: information redundancy and diversity. Such a purpose is achieved through a new sentence clustering algorithm based on a graph model that makes use of statistic similarities and linguistic treatment. The DUC 2002 dataset was used to assess the performance of the proposed system, surpassing DUC competitors by a 50% margin of f-measure, in the best case.

international conference on document analysis and recognition | 2009

Image Classification to Improve Printing Quality of Mixed-Type Documents

Rafael Dueire Lins; Gabriel de França Pereira e Silva; Steven J. Simske; Jian Fan; Mark Q. Shaw; Paulo Sá; Marcelo Thielo

Functional image classification is the assignment of different image types to separate classes to optimize their rendering for reading or other specific end task, and is an important area of research in the publishing and multi-Average industries. This paper presents recent research on optimizing the simultaneous classification of documents, photos and logos. Each of these is handled during printing with a class-specific pipeline of image transformation algorithms, and misclassification results in pejorative imaging effects. This paper reports on replacing an existing classifier with a Weka-based classifier that simultaneously improves accuracy (from 85.3% to 90.8%) and performance (from 1458 msec to 418 msec/image). Generic subsampling of the images further improved the performance (to 199 msec/image) with only a modest impact on accuracy (to 90.4%). A staggered subsampling approach, finally, improved both accuracy (to 96.4%) and performance (to 147 msec/image) for the Weka-base classifier. This approach did not appreciable benefit the HP classifier (85.4% accuracy, 497 msec/image). These data indicate staggered subsampling using the optimized Weka classifier substantially improves the classification accuracy and performance without resulting in additional “egregious” misclassifications (assigning photos or logos to the “document” class).

international conference on document analysis and recognition | 2011

An Automatic Method for Enhancing Character Recognition in Degraded Historical Documents

Gabriel de França Pereira e Silva; Rafael Dueire Lins

Automatic optical character recognition is an important research area in document processing. There are several commercial tools for such purpose, which are becoming more efficient every day. There is still a lot to be improved, in the case of historical documents, however, due to the presence of noise and degradation. This paper presents a new approach for enhancing the character recognition in degraded historical documents. The system proposed consists in identifying regions in which there is information loss due to physical document degradation and process the document with possible candidates for the correct text transcription.

international conference on pattern recognition | 2010

Enhancing the Filtering-Out of the Back-to-Front Interference in Color Documents with a Neural Classifier

Gabriel de França Pereira e Silva; Rafael Dueire Lins; João Marcelo Monte da Silva; S. Banergee; A. Kuchibhotla; Marcelo Thielo

Back-to-front, show-through, or bleeding are the names given to the interference that appears whenever one writes or prints on both sides of translucent paper. Such interference degrades image binarization and document transcription via OCR. The technical literature presents several algorithms to remove the back-to-front noise, but no algorithm is good enough in all cases. This article presents a new technique to remove such noise in color documents which makes use of neural classifiers to evaluate the degree of intensity of the interference and besides that to indicate the existence of blur. Such classifier allows tuning the parameters of an algorithm for back-to-front interference and document enhancement.

international conference on image analysis and recognition | 2007

Enhancing document images acquired using portable digital cameras

Rafael Dueire Lins; André R. Gomes e Silva; Gabriel de França Pereira e Silva

Portable digital cameras are of widespread use today. Their image quality, low cost and portability have drastically changed the culture of photography. Professionals of many different areas start to take photos of documents, instead of photocopying them. This article presents techniques for improving the quality of document images digitalized with portable digital cameras.

document engineering | 2015

Automatic Text Document Summarization Based on Machine Learning

Gabriel de França Pereira e Silva; Rafael Ferreira; Rafael Dueire Lins; Luciano de Souza Cabral; Hilário Oliveira; Steven J. Simske; Marcelo Riss

The need for automatic generation of summaries gained importance with the unprecedented volume of information available in the Internet. Automatic systems based on extractive summarization techniques select the most significant sentences of one or more texts to generate a summary. This article makes use of Machine Learning techniques to assess the quality of the twenty most referenced strategies used in extractive summarization, integrating them in a tool. Quantitative and qualitative aspects were considered in such assessment demonstrating the validity of the proposed scheme. The experiments were performed on the CNN-corpus, possibly the largest and most suitable test corpus today for benchmarking extractive summarization strategies.

Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014

An Approach for Learning and Construction of Expressive Ontology from Text in Natural Language

Ryan Ribeiro de Azevedo; Fred Freitas; Rodrigo G. C. Rocha; José Antônio Alves de Menezes; Cleyton Rodrigues; Gabriel de França Pereira e Silva

In this paper, we present an approach based on Ontology Learning and Natural Language Processing for automatic construction of expressive Ontologies, specifically in OWL DL with ALC expressivity, from a natural language text. The viability of our approach is demonstrated through the generation of descriptions of complex axioms from concepts defined by users and glossaries found at Wikipedia. We evaluated our approach in an experiment with entry sentences enriched with hierarchy axioms, disjunction, conjunction, negation, as well as existential and universal quantification to impose restriction of properties. The obtained results prove that our model is an effective solution for knowledge representation and automatic construction of expressive Ontologies. Thereby, it assists professionals involved in processes for obtain, construct and model knowledge domain.

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing | 2011

HistDoc v. 2.0: enhancing a platform to process historical documents

Rafael Dueire Lins; Gabriel de França Pereira e Silva; Andrei de Araújo Formiga

The first version of the HistDoc platform was designed as an ImageJ plugin to process images of historical documents. This paper presents the second version of HistDoc that besides updating the image processing capabilities of HistDoc in a number of ways, including processing images of monochromatic documents and incorporating newer and better algorithms for the old functionality, it allows document images to be batch processed in standalone mode in a single machine and in parallel distributed architectures in cluster and grids.

international conference on image analysis and recognition | 2010

HistDoc - a toolbox for processing images of historical documents

Gabriel de França Pereira e Silva; Rafael Dueire Lins; João Marcelo Monte da Silva

HistDoc is a software tool designed to process images of historical documents. It has two operation modes: standalone mode - one can process one image a time; and batch mode - one can process thousands of documents automatically. This tool automatically detects noises present in the document image including back-to-front interference (also called bleeding or show-through) and uses the best techniques to filter it out. Besides that it removes noisy borders and salt-and-pepper degradation introduced during the digitalization process. PhotoDoc also allows document binarization and image compression.

Explore More