Otávio Augusto Bizetto Penatti
Samsung
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Otávio Augusto Bizetto Penatti.
Journal of Visual Communication and Image Representation | 2012
Otávio Augusto Bizetto Penatti; Eduardo Valle; Ricardo da Silva Torres
This paper presents a comparative study of color and texture descriptors considering the Web as the environment of use. We take into account the diversity and large-scale aspects of the Web considering a large number of descriptors (24 color and 28 texture descriptors, including both traditional and recently proposed ones). The evaluation is made on two levels: a theoretical analysis in terms of algorithms complexities and an experimental comparison considering efficiency and effectiveness aspects. The experimental comparison contrasts the performances of the descriptors in small-scale datasets and in a large heterogeneous database containing more than 230 thousand images. Although there is a significant correlation between descriptors performances in the two settings, there are notable deviations, which must be taken into account when selecting the descriptors for large-scale tasks. An analysis of the correlation is provided for the best descriptors, which hints at the best opportunities of their use in combination.
computer vision and pattern recognition | 2015
Otávio Augusto Bizetto Penatti; Keiller Nogueira; Jefersson Alex dos Santos
In this paper, we evaluate the generalization power of deep features (ConvNets) in two new scenarios: aerial and remote sensing image classification. We evaluate experimentally ConvNets trained for recognizing everyday objects for the classification of aerial and remote sensing images. ConvNets obtained the best results for aerial images, while for remote sensing, they performed well but were outperformed by low-level color descriptors, such as BIC. We also present a correlation analysis, showing the potential for combining/fusing different ConvNets with other descriptors or even for combining multiple ConvNets. A preliminary set of experiments fusing ConvNets obtains state-of-the-art results for the well-known UCMerced dataset.
Pattern Recognition | 2017
Keiller Nogueira; Otávio Augusto Bizetto Penatti; Jefersson Alex dos Santos
Abstract We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to better use existing ConvNets. We perform experiments with six popular ConvNets using three remote sensing datasets. We also compare ConvNets in each strategy with existing descriptors and with state-of-the-art baselines. Results point that fine tuning tends to be the best performing strategy. In fact, using the features from the fine-tuned ConvNet with linear SVM obtains the best results. We also achieved state-of-the-art results for the three datasets used.
Pattern Recognition | 2014
Otávio Augusto Bizetto Penatti; Fernanda B. Silva; Eduardo Valle; Valérie Gouet-Brunet; Ricardo da Silva Torres
We present word spatial arrangement (WSA), an approach to represent the spatial arrangement of visual words under the bag-of-visual-words model. It lies in a simple idea which encodes the relative position of visual words by splitting the image space into quadrants using each detected point as origin. WSA generates compact feature vectors and is flexible for being used for image retrieval and classification, for working with hard or soft assignment, requiring no pre/post processing for spatial verification. Experiments in the retrieval scenario show the superiority of WSA in relation to Spatial Pyramids. Experiments in the classification scenario show a reasonable compromise between those methods, with Spatial Pyramids generating larger feature vectors, while WSA provides adequate performance with much more compact features. As WSA encodes only the spatial information of visual words and not their frequency of occurrence, the results indicate the importance of such information for visual categorization. HighlightsSpatial arrangement of visual words (WSA) for image retrieval and classification.WSA generates vectors more compact than the traditional spatial pooling methods.WSA outperforms Spatial Pyramids in the retrieval scenario.WSA presents adequate performance in the classification scenario.
Image and Vision Computing | 2014
Daniel Carlos Guimarães Pedronette; Otávio Augusto Bizetto Penatti; Ricardo da Silva Torres
In this paper, we present an unsupervised distance learning approach for improving the effectiveness of image retrieval tasks. We propose a Reciprocal kNN Graph algorithm that considers the relationships among ranked lists in the context of a k-reciprocal neighborhood. The similarity is propagated among neighbors considering the geometry of the dataset manifold. The proposed method can be used both for re-ranking and rank aggregation tasks. Unlike traditional diffusion process methods, which require matrix multiplication operations, our algorithm takes only a subset of ranked lists as input, presenting linear complexity in terms of computational and storage requirements. We conducted a large evaluation protocol involving shape, color, and texture descriptors, various datasets, and comparisons with other post-processing approaches. The re-ranking and rank aggregation algorithms yield better results in terms of effectiveness performance than various state-of-the-art algorithms recently proposed in the literature, achieving bulls eye and MAP scores of 100% on the well-known MPEG-7 shape dataset.
international conference on multimedia retrieval | 2012
Otávio Augusto Bizetto Penatti; Lin Tzy Li; Jurandy Almeida; Ricardo da Silva Torres
This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.
brazilian symposium on computer graphics and image processing | 2008
Otávio Augusto Bizetto Penatti; R. da S. Torres
This paper presents a comparative study of color descriptors for content-based image retrieval on the Web. Several image descriptors were compared theoretically and the most relevant ones were implemented and tested in two different databases. The main goal was to find out the best descriptors for Web image retrieval. Descriptors are compared according to the extraction and distance functions complexities, the compactness of feature vectors, and the ability to retrieve relevant images.
Multimedia Tools and Applications | 2014
Lin Tzy Li; Daniel Carlos Guimarães Pedronette; Jurandy Almeida; Otávio Augusto Bizetto Penatti; Rodrigo Tripodi Calumby; Ricardo da Silva Torres
This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to define appropriate locations for videos. An architecture that implements the proposed framework is designed. In this architecture, there are specific modules for each modality (e.g, textual and visual) that can be developed and evolved independently. Another component is a data fusion module responsible for combining seamlessly the ranked lists defined for each modality. We have validated the proposed framework in the context of the MediaEval 2012 Placing Task, whose objective is to automatically assign geographical coordinates to videos. Obtained results show how our multimodal approach improves the geocoding results when compared to methods that rely on a single modality (either textual or visual descriptors). We also show that the proposed multimodal approach yields comparable results to the best submissions to the Placing Task in 2012 using no extra information besides the available development/training data. Another contribution of this work is related to the proposal of a new effectiveness evaluation measure. The proposed measure is based on distance scores that summarize how effective a designed/tested approach is, considering its overall result for a test dataset.
iberoamerican congress on pattern recognition | 2011
Otávio Augusto Bizetto Penatti; Eduardo Valle; Ricardo da Silva Torres
This paper presents a new approach to encode spatial-relationship information of visual words in the well-known visual dictionary model. The current most popular approach to describe images based on visual words is by means of bags-of-words which do not encode any spatial information. We propose a graceful way to capture spatial-relationship information of visual words that encodes the spatial arrangement of every visual word in an image. Our experiments show the importance of the spatial information of visual words for image classification and show the gain in classification accuracy when using the new method. The proposed approach creates opportunities for further improvements in image description under the visual dictionary model.
international conference on multimedia retrieval | 2014
Daniel Carlos Guimarães Pedronette; Otávio Augusto Bizetto Penatti; Rodrigo Tripodi Calumby; Ricardo da Silva Torres
This paper presents a novel unsupervised learning approach that takes into account the intrinsic dataset structure, which is represented in terms of the reciprocal neighborhood references found in different ranked lists. The proposed Reciprocal kNN Distance defines a more effective distance between two images, and is used to improve the effectiveness of image retrieval systems. Several experiments were conducted for different image retrieval tasks involving shape, color, and texture descriptors. The proposed approach is also evaluated on multimodal retrieval tasks, considering visual and textual descriptors. Experimental results demonstrate the effectiveness of proposed approach. The Reciprocal kNN Distance yields better results in terms of effectiveness than various state-of-the-art algorithms.