Robson L. F. Cordeiro

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Robson L. F. Cordeiro is active.

Explore More

Publication

Featured researches published by Robson L. F. Cordeiro.

international conference on data engineering | 2010

Finding Clusters in subspaces of very large, multi-dimensional datasets

Robson L. F. Cordeiro; Agma J. M. Traina; Christos Faloutsos; Caetano Traina

We propose the Multi-resolution Correlation Cluster detection (MrCC), a novel, scalable method to detect correlation clusters able to analyze dimensional data in the range of around 5 to 30 axes. Existing methods typically exhibit super-linear behavior in terms of space or execution time. MrCC employs a novel data structure based on multi-resolution and gains over previous approaches in: (a) it finds clusters that stand out in the data in a statistical sense; (b) it is linear on running time and memory usage regarding number of data points and dimensionality of subspaces where clusters exist; (c) it is linear in memory usage and quasi-linear in running time regarding space dimensionality; and (d) it is accurate, deterministic, robust to noise, does not require stating the number of clusters as input parameter, does not perform distance calculation and is able to detect clusters in subspaces generated by original axes or linear combinations of original axes, including space rotation. We performed experiments on synthetic data ranging from 5 to 30 axes and from 12k to 250k points, and MrCC outperformed in time five of the recent and related work, being in average 10 times faster than the competitors that also presented high accuracy results for every tested dataset. Regarding real data, MrCC found clusters at least 9 times faster than the competitors, increasing their accuracy in up to 34 percent.

international conference on data mining | 2010

QMAS: Querying, Mining and Summarization of Multi-modal Databases

Robson L. F. Cordeiro; Fan Guo; Donna Haverkamp; James H. Horne; Ellen K. Hughes; Gunhee Kim; Agma J. M. Traina; Caetano Traina; Christos Faloutsos

Given a large collection of images, very few of which have labels, how can we guess the labels of the remaining majority, and how can we spot those images that need brand new labels, different from the existing ones? Current automatic labeling techniques usually scale super linearly with the data size, and/or they fail when only a tiny amount of labeled data is provided. In this paper, we propose QMAS (Querying, Mining And Summarization of Multi-modal Databases), a fast solution to the following problems: (i) low-labor labeling (L3) – given a collection of images, very few of which are labeled with keywords, find the most suitable labels for the remaining ones, and (ii) mining and attention routing – in the same setting, find clusters, the top-NO outlier images, and the top-NR representative images. We report experiments on real satellite images, two large sets (1.5GB and 2.25GB) of proprietary images and a smaller set (17MB) of public images. We show that QMAS scales linearly with the data size, being up to 40 times faster than top competitors (GCap), obtaining better or equal accuracy. In contrast to other methods, QMAS does low-labor labeling (L3), that is, it works even with tiny initial label sets. It also solves both presented problems and spots tiles that potentially require new labels.

Information Sciences | 2017

ORFEL: Efficient detection of defamation or illegitimate promotion in online recommendation

Gabriel P. Gimenes; Robson L. F. Cordeiro; Jose F. Rodrigues-Jr

What if a successful company starts to receive a torrent of low-valued (one or two stars) recommendations in its mobile apps from multiple users within a short (say one month) period of time? Is it legitimate evidence that the apps have lost in quality, or an intentional plan (via lockstep behavior) to steal market share through defamation? In the case of a systematic attack to one’s reputation, it might not be possible to manually discern between legitimate and fraudulent interaction within the huge universe of possibilities of user-product recommendation. Previous works have focused on this issue, but none of them took into account the context, modeling, and scale that we consider in this paper. Here, we propose the novel method Online-Recommendation Fraud ExcLuder (ORFEL) to detect defamation and/or illegitimate promotion of online products by using vertex-centric asynchronous parallel processing of bipartite (users-products) graphs. With an innovative algorithm, our results demonstrate both efficacy and efficiency – over 95% of potential attacks were detected, and ORFEL was at least two orders of magnitude faster than the state-of-the-art. Over a novel methodology, our main contributions are: (1) a new algorithmic solution; (2) one scalable approach; and (3) a novel context and modeling of the problem, which now addresses both defamation and illegitimate promotion. Our work deals with relevant issues of the Web 2.0, potentially augmenting the credibility of online recommendation to prevent losses to both customers and vendors.

international world wide web conferences | 2013

Analysis of large scale climate data: how well climate change models and data from real sensor networks agree?

Santiago Augusto Nunes; Luciana A. S. Romani; Ana Maria Heuminski de Ávila; Priscila Pereira Coltri; Caetano Traina; Robson L. F. Cordeiro; Elaine P. M. de Sousa; Agma J. M. Traina

Research on global warming and climate changes has attracted a huge attention of the scientific community and of the media in general, mainly due to the social and economic impacts they pose over the entire planet. Climate change simulation models have been developed and improved to provide reliable data, which are employed to forecast effects of increasing emissions of greenhouse gases on a future global climate. The data generated by each model simulation amount to Terabytes of data, and demand fast and scalable methods to process them. In this context, we propose a new process of analysis aimed at discriminating between the temporal behavior of the data generated by climate models and the real climate observations gathered from ground-based meteorological station networks. Our approach combines fractal data analysis and the monitoring of real and model-generated data streams to detect deviations on the intrinsic correlation among the time series defined by different climate variables. Our measurements were made using series from a regional climate model and the corresponding real data from a network of sensors from meteorological stations existing in the analyzed region. The results show that our approach can correctly discriminate the data either as real or as simulated, even when statistical tests fail. Those results suggest that there is still room for improvement of the state-of-the-art climate change models, and that the fractal-based concepts may contribute for their improvement, besides being a fast, parallelizable, and scalable approach.

international symposium on multimedia | 2013

Efficient Execution of Conjunctive Complex Queries on Big Multimedia Databases

Karina Fasolin; Renato Fileto; Marcelo Krugery; Daniel S. Kaster; Mônica Ribeiro Porto Ferreira; Robson L. F. Cordeiro; Agma J. M. Traina; Caetano Traina

This paper proposes an approach to efficiently execute conjunctive queries on big complex data together with their related conventional data. The basic idea is to horizontally fragment the database according to criteria frequently used in query predicates. The collection of fragments is indexed to efficiently find the fragment(s) whose contents satisfy some query predicate(s). The contents of each fragment are then indexed as well, to support efficient filtering of the fragment data according to other query predicate(s) conjunctively connected to the former. This strategy has been applied to a collection of more than 106 million images together with their related conventional data. Experimental results show considerable performance gain of the proposed approach for queries with conventional and similarity-based predicates, compared to the use of a unique metric index for the entire database contents.

international conference on enterprise information systems | 2016

On the Support of a Similarity-enabled Relational Database Management System in Civilian Crisis Situations

Paulo H. Oliveira; Antonio C. Fraideinberze; Natan A. Laverde; Hugo Gualdron; André S. Gonzaga; Lucas D. R. Ferreira; Willian D. Oliveira; F Jose Rodrigues-Jr.; Robson L. F. Cordeiro; Caetano Traina; Agma J. M. Traina; Elaine P. M. de Sousa

Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provide a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a methodology together with the Data-Centric Crisis Management (DCCM) architecture, which employs our methods over a similarity-enabled RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that our proposal is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. Hence, we expect our work to provide a framework for further developments on crisis management solutions.

international conference on enterprise information systems | 2018

TendeR-Sims - Similarity Retrieval System for Public Tenders.

Guilherme Q. Vasconcelos; Guilherme F. Zabot; Daniel Mário de Lima; José F. Rodrigues; Caetano Traina; Daniel S. Kaster; Robson L. F. Cordeiro

TendeR-Sims (Tender Retrieval by Similarity) is a system that helps to search for satisfiable request for tender’s lots in a database by filtering irrelevant lots, so companies can easily discover the contracts they can win. The system implements the Similarity-aware Relational Division Operator in a commercial Relational Database Management System (RDBMS), and compares products by combining a path distance in a preprocessed ontology with a textual distance. Tender-Sims focuses on answering the following query: select the lots where a company has a similar enough item for each of all required items. We evaluated our proposed system employing a dataset composed of product catologs of Brazilian companies in the food market and real requests for tenders with known results. In the presented experiments, TendeR Sims achieved up to 66% cost reduction at 90% recall when compared to the ground truth.

international conference on data mining | 2015

StructMatrix: Large-Scale Visualization of Graphs by Means of Structure Detection and Dense Matrices

Hugo Gualdron; Robson L. F. Cordeiro; José Fernando Rodrigues

Given a large-scale graph with millions of nodes and edges, how to reveal macro patterns of interest, like cliques, bi-partite cores, stars, and chains? Furthermore, how to visualize such patterns altogether getting insights from the graph to support wise decision-making? Although there are many algorithmic and visual techniques to analyze graphs, none of the existing approaches is able to present the structural information of graphs at large-scale. Hence, this paper describes StructMatrix, a methodology aimed at high-scalable visual inspection of graph structures with the goal of revealing macro patterns of interest. StructMatrix combines algorithmic structure detection and adjacency matrix visualization to present cardinality, distribution, and relationship features of the structures found in a given graph. We performed experiments in real, large-scale graphs with up to one million nodes and millions of edges. StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and DBLP) have characterizations that reflect the nature of their corresponding domains, our findings have not been seen in the literature so far. We expect that our technique will bring deeper insights into large graph mining, leveraging their use for decision making.

international conference on enterprise information systems | 2018

Classification Analysis of NDVI Time Series in Metric Spaces for Sugarcane Identification.

Lucas Felipe Kunze; Thábata Amaral; Leonardo Mauro Pereira Moraes; Jadson José Monteiro Oliveira; Altamir Gomes Bispo Junior; Elaine P. M. de Sousa; Robson L. F. Cordeiro

In Brazil, agribusiness is an important task for the economy, since it provides a substantial part of the country’s Gross Domestic Product. Besides that, interest in biofuels has grown, considering they make viable the use of renewable energy. Brazil is the world’s largest producer of sugarcane, which enables a large ethanol production. Thus, to monitor agricultural areas is important to support decision making. However, the amount of generated and stored data about these areas has been increasing in such a way that far exceeds the human capacity to manually analyze and extract information from it. That is why automatic and scalable data mining approaches are necessary. This work focuses on the sugarcane classification task, taking as input NDVI time series extracted from remote sensing images. Existing related works propose to analyze non-metric features spaces using the DTW distance function as a basis. Here we demonstrate that analyzing the multidimensional space with Minkowski distance provides better results, considering a variety of classifiers. kNN using L2 distance performed similarly or better than using DTW. We also demonstrate a data configuration with geolocation for training XGBoost, with results better than state-of-the-art.

advances in databases and information systems | 2018

On the Support of the Similarity-Aware Division Operator in a Commercial RDBMS

Guilherme Q. Vasconcelos; Daniel S. Kaster; Robson L. F. Cordeiro

The division operator from the relational algebra allows simple and intuitive representation of queries with the concept of “for all”, and thus it is required in many real applications. However, the relational division is unable to support the needs of modern applications that manipulate complex data, such as images, audio, long texts, genetic sequences, etc. These data are better compared by similarity, whereas relational algebra always compares data by equality or inequality. Recent works focus on extending relational operators to support similarity comparisons and their inclusion in relational database management systems. This work incorporates and studies the behavior of several similarity-aware division algorithms in a commercial RDBMS. We compared the two state-of-art algorithms against several SQL statements and found when to use each one of them in order to improve query time execution. We then propose an extension of the SQL syntax and the query analyzer to support this new operator.

Explore More