Bruno Tenório Ávila

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bruno Tenório Ávila is active.

Explore More

Publication

Featured researches published by Bruno Tenório Ávila.

document engineering | 2005

A fast orientation and skew detection algorithm for monochromatic document images

Bruno Tenório Ávila; Rafael Dueire Lins

Very often in the digitization process, documents are either not placed with the correct orientation or are rotated of small angles in relation to the original image axis. These factors make more difficult the visualization of images by human users, increase the complexity of any sort of automatic image recognition, degrade the performance of OCR tools, increase the space needed for image storage, etc. This paper presents a fast algorithm for orientation and skew detection for complex monochromatic document images, which is capable of detecting any document rotation at a high precision.

acm symposium on applied computing | 2004

A new algorithm for removing noisy borders from monochromatic documents

Bruno Tenório Ávila; Rafael Dueire Lins

This paper presents a new and efficient algorithm for removing noisy borders in monochromatic images of documents introduced by the digitalisation process using automatically fed scanners.

international conference on image analysis and recognition | 2004

A New Algorithm for Skew Detection in Images of Documents

Rafael Dueire Lins; Bruno Tenório Ávila

Very frequently the digitalisation process of documents produce images rotated of small angles in relation to the original image axis. The skew introduced makes more difficult the visualisation of images by human users. Besides that, it increases the complexity of any sort of automatic image recognition, degrades the performance of OCR tools, increases the space needed for image storage, etc. Thus, skew correction is an important part of any document processing system being a matter of concern of researchers for almost two decades now. The search for faster and good quality solutions to this problem is still on. This paper presents an efficient algorithm for skew detection and correction of images of documents including non-textual graphical elements, such as pictures and tables. The new algorithm was tested in over 10,000 images yielding satisfactory performance.

international conference on image analysis and recognition | 2006

BigBatch – an environment for processing monochromatic documents

Rafael Dueire Lins; Bruno Tenório Ávila; Andrei de Araújo Formiga

BigBatch is a processing environment designed to automatically process batches of millions of monochromatic images of documents generated by production line scanners. It removes noisy borders, checks and corrects orientation, calculates and compensates the skew angle, crops the image standardizing document sizes, and finally compresses it according to user defined file format. BigBatch encompasses the best and recently developed algorithms for such kind of document images. BigBatch may work either in standalone or operator assisted modes. Besides that, BigBatch in standalone mode is able to process in clusters of workstations or in grids.

document engineering | 2005

A new rotation algorithm for monochromatic images

Bruno Tenório Ávila; Rafael Dueire Lins; Lamberto Oliveira

The classical rotation algorithm applied to monochromatic images introduces white holes in black areas, making edges uneven and disconnecting neighboring elements. Several algorithms in the literature address only the white hole problem. This paper proposes a new algorithm that solves those three problems, producing better quality images.

Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) on | 2014

A New Sentence Similarity Method Based on a Three-Layer Sentence Representation

Rafael Ferreira; Rafael Dueire Lins; Frederico Luiz Gonçalves de Freitas; Bruno Tenório Ávila; Steven J. Simske; Marcelo Riss

Sentence similarity methods are used to assess the degree of likelihood between phrases. Many natural language applications such as text summarization, information retrieval, text categorization, and machine translation employ measures of sentence similarity. The existing approaches for this problem represent sentences as vectors of bag of words or the syntactic information of the words in the phrase. The likelihood between phrases is calculated by composing the similarity between the words in the sentences. Such schemes do not address two important concerns in the area, however: the semantic problem and the word order. This paper proposes a new sentence similarity measure that attempts to address such problems by taking into account the lexical, syntactic, and semantic analysis of sentences. The new similarity measure proposed outperforms the state of the art systems in around 6%, when tested using a standard and publically available dataset.

document engineering | 2014

A platform for language independent summarization

Luciano de Souza Cabral; Rafael Dueire Lins; Rafael Fe Mello; Fred Freitas; Bruno Tenório Ávila; Steven J. Simske; Marcelo Riss

The text data available on the Internet is not only huge in volume, but also in diversity of subject, quality and idiom. Such factors make it infeasible to efficiently scavenge useful information from it. Automatic text summarization is a possible solution for efficiently addressing such a problem, because it aims to sieve the relevant information in documents by creating shorter versions of the text. However, most of the techniques and tools available for automatic text summarization are designed only for the English language, which is a severe restriction. There are multilingual platforms that support, at most, 2 languages. This paper proposes a language independent summarization platform that provides corpus acquisition, language classification, translation and text summarization for 25 different languages.

ACM Transactions on The Web | 2016

W-tree: A Compact External Memory Representation for Webgraphs

Bruno Tenório Ávila; Rafael Dueire Lins

World Wide Web applications need to use, constantly update, and maintain large webgraphs for executing several tasks, such as calculating the web impact factor, finding hubs and authorities, performing link analysis by webometrics tools, and ranking webpages by web search engines. Such webgraphs need to use a large amount of main memory, and, frequently, they do not completely fit in, even if compressed. Therefore, applications require the use of external memory. This article presents a new compact representation for webgraphs, called w-tree, which is designed specifically for external memory. It supports the execution of basic queries (e.g., full read, random read, and batch random read), set-oriented queries (e.g., superset, subset, equality, overlap, range, inlink, and co-inlink), and some advanced queries, such as edge reciprocal and hub and authority. Furthermore, a new layout tree designed specifically for webgraphs is also proposed, reducing the overall storage cost and allowing the random read query to be performed with an asymptotically faster runtime in the worst case. To validate the advantages of the w-tree, a series of experiments are performed to assess an implementation of the w-tree comparing it to a compact main memory representation. The results obtained show that w-tree is competitive in compression time and rate and in query time, which may execute several orders of magnitude faster for set-oriented queries than its competitors. The results provide empirical evidence that it is feasible to use a compact external memory representation for webgraphs in real applications, contradicting the previous assumptions made by several researchers.

IEEE Transactions on Information Theory | 2017