Elaine P. M. de Sousa

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Elaine P. M. de Sousa is active.

Explore More

Publication

Featured researches published by Elaine P. M. de Sousa.

Data Mining and Knowledge Discovery | 2007

A fast and effective method to find correlations among attributes in databases

Elaine P. M. de Sousa; Caetano Traina; Agma J. M. Traina; Leejay Wu; Christos Faloutsos

The problem of identifying meaningful patterns in a database lies at the very heart of data mining. A core objective of data mining processes is the recognition of inter-attribute correlations. Not only are correlations necessary for predictions and classifications – since rules would fail in the absence of pattern – but also the identification of groups of mutually correlated attributes expedites the selection of a representative subset of attributes, from which existing mappings allow others to be derived. In this paper, we describe a scalable, effective algorithm to identify groups of correlated attributes. This algorithm can handle non-linear correlations between attributes, and is not restricted to a specific family of mapping functions, such as the set of polynomials. We show the results of our evaluation of the algorithm applied to synthetic and real world datasets, and demonstrate that it is able to spot the correlated attributes. Moreover, the execution time of the proposed technique is linear on the number of elements and of correlations in the dataset.

New Generation Computing | 2007

Measuring evolving data streams' behavior through their intrinsic dimension

Elaine P. M. de Sousa; Agma J. M. Traina; Caetano Traina; Christos Faloutsos

The dimension of a dataset has major impact on database management, such as indexing and querying processing. The embedding dimension (i.e., the number of attributes of the dataset) usually overestimates the actual contribution of the attributes to the main characteristics of the data, as the typical assumption of uniform distribution and independence between attributes usually does not hold. In fact, due to dependencies and attribute correlations, real data are typically skewed and exhibit intrinsic dimensionality much lower than the embedding dimension. Similarly, the intrinsic dimension can also be applied to improve data stream processing and analysis. Data streams are generated as sequences of events represented by a predetermined number of numerical attributes. Thus, without loss of generality, we can consider events as elements from a dimensional domain. This paper presents a fast, linear algorithm to measure the intrinsic dimension of a data stream on the fly, following its continuously changing behavior. Experimental studies show that the intrinsic dimension can be used to analyze attribute correlations. The results on well-understood datasets closely follow what is expected from the known behavior of the data.

acm symposium on applied computing | 2006

Effective shape-based retrieval and classification of mammograms

Joaquim Cezar Felipe; Marcela Xavier Ribeiro; Elaine P. M. de Sousa; Agma J. M. Traina; Caetano Traina

This paper presents a new approach to support Computer-aided Diagnosis (CAD) aiming at assisting the task of classification and similarity retrieval of mammographic mass lesions, based on shape content. We have tested classical algorithms for automatic segmentation of this kind of image, but usually they are not precise enough to generate accurate contours to allow lesion classification based on shape analyses. Thus, in this work, we have used Zernike moments for invariant pattern recognition within regions of interest (ROIs), without previous segmentation of images. A new data mining algorithm that generates statistical-based association rules is used to identify representative features that discriminate the disease classes of images. In order to minimize the computational effort, an algorithm based on fractal theory is applied to reduce the dimension of feature vectors. K-nearest neighbor retrieval was applied to a database containing images excerpted from previously classified digitalized mammograms presenting breast lesions. The results reveal that our approach allows fast and effective feature extraction and is robust and suitable for analyzing this kind of image.

acm symposium on applied computing | 2006

Evaluating the intrinsic dimension of evolving data streams

Elaine P. M. de Sousa; Agma J. M. Traina; Caetano Traina; Christos Faloutsos

Data streams are fundamental in several data processing applications involving large amount of data generated continuously as a sequence of events. Frequently, such events are not stored, so the data is analyzed and queried as they arrive and discarded right away. In many applications these events are represented by a predetermined number of numerical attributes. Thus, without loss of generality, we can consider events as elements from a dimensional domain. A sequence of events in a data stream can be characterized by its intrinsic dimension, which in dimensional datasets is usually lower than the embedding dimensionality. As the intrinsic dimension can be used to improve the performance of algorithms handling dimensional data (specially query optimization) measuring it is relevant to improve data streams processing and analysis as well. Moreover, it can also be useful to forecast data behavior. Hence, we present an algorithm able to measure the intrinsic dimension of a data stream on the fly, following its continuously changing behavior. We also present experimental studies, using both real and synthetic data streams, showing that the results on well-understood datasets closely follow what is expected from the known behavior of the data.

international world wide web conferences | 2013

Analysis of large scale climate data: how well climate change models and data from real sensor networks agree?

Santiago Augusto Nunes; Luciana A. S. Romani; Ana Maria Heuminski de Ávila; Priscila Pereira Coltri; Caetano Traina; Robson L. F. Cordeiro; Elaine P. M. de Sousa; Agma J. M. Traina

Research on global warming and climate changes has attracted a huge attention of the scientific community and of the media in general, mainly due to the social and economic impacts they pose over the entire planet. Climate change simulation models have been developed and improved to provide reliable data, which are employed to forecast effects of increasing emissions of greenhouse gases on a future global climate. The data generated by each model simulation amount to Terabytes of data, and demand fast and scalable methods to process them. In this context, we propose a new process of analysis aimed at discriminating between the temporal behavior of the data generated by climate models and the real climate observations gathered from ground-based meteorological station networks. Our approach combines fractal data analysis and the monitoring of real and model-generated data streams to detect deviations on the intrinsic correlation among the time series defined by different climate variables. Our measurements were made using series from a regional climate model and the corresponding real data from a network of sensors from meteorological stations existing in the analyzed region. The results show that our approach can correctly discriminate the data either as real or as simulated, even when statistical tests fail. Those results suggest that there is still room for improvement of the state-of-the-art climate change models, and that the fractal-based concepts may contribute for their improvement, besides being a fast, parallelizable, and scalable approach.

international conference on enterprise information systems | 2016

On the Support of a Similarity-enabled Relational Database Management System in Civilian Crisis Situations

Paulo H. Oliveira; Antonio C. Fraideinberze; Natan A. Laverde; Hugo Gualdron; André S. Gonzaga; Lucas D. R. Ferreira; Willian D. Oliveira; F Jose Rodrigues-Jr.; Robson L. F. Cordeiro; Caetano Traina; Agma J. M. Traina; Elaine P. M. de Sousa

Crowdsourcing solutions can be helpful to extract information from disaster-related data during crisis management. However, certain information can only be obtained through similarity operations. Some of them also depend on additional data stored in a Relational Database Management System (RDBMS). In this context, several works focus on crisis management supported by data. Nevertheless, none of them provide a methodology for employing a similarity-enabled RDBMS in disaster-relief tasks. To fill this gap, we introduce a methodology together with the Data-Centric Crisis Management (DCCM) architecture, which employs our methods over a similarity-enabled RDBMS. We evaluate our proposal through three tasks: classification of incoming data regarding current events, identifying relevant information to guide rescue teams; filtering of incoming data, enhancing the decision support by removing near-duplicate data; and similarity retrieval of historical data, supporting analytical comprehension of the crisis context. To make it possible, similarity-based operations were implemented within one popular, open-source RDBMS. Results using real data from Flickr show that our proposal is feasible for real-time applications. In addition to high performance, accurate results were obtained with a proper combination of techniques for each task. Hence, we expect our work to provide a framework for further developments on crisis management solutions.

international geoscience and remote sensing symposium | 2014

Land use temporal analysis through clustering techniques on satellite image time series

Renata Ribeiro do Valle Gonçalves; Jurandir Zullo; Bruno Ferraz do Amaral; Priscila Pereira Coltri; Elaine P. M. de Sousa; Luciana A. S. Romani

Satellite images time series have been used to study land surface, such as identification of forest, water, urban areas, as well as for meteorological applications. However, for knowledge discovery in large remote sensing databases can be use clustering techniques in multivariate time series. The clustering technique on three-dimensional time series of NDVI, albedo and surface temperature from AVHRR/NOAA satellite images was used, in this study, to map the variability of land use. This approach was suitable to accomplish the temporal analysis of land use. Additionally, this technique can be used to identify and analyze dynamics of land use and cover being useful to support researches in agriculture, even considering low spatial resolution satellite images. The possibility of extracting time series from satellite images, analyzing them through data mining techniques, such as clustering, and visualizing results in geospatial way is an important advance and support to agricultural monitoring tasks.

european conference on principles of data mining and knowledge discovery | 2007

A density-biased sampling technique to improve cluster representativeness

Ana Paula Appel; Adriano Arantes Paterlini; Elaine P. M. de Sousa; Agma J. M. Traina; Caetano Traina

The volume and complexity of data collected by modern applications has grown significantly, leading to increasingly costly operations for both data manipulation and analysis. Sampling is an useful technique to support manager a more sensible volume in the data reduction process. Uniform sampling has been widely used but, in datasets exhibiting skewed cluster distribution, biased sampling shows better results. This paper presents the BBS - Biased Box Samplingalgorithm which aims at keeping the skewed tendency of the clusters from the original data. We also present experimental results obtained with the proposed BBS algorithm.

arXiv: Social and Information Networks | 2018

Complex-Network Tools to Understand the Behavior of Criminality in Urban Areas

Gabriel Spadon; Lucas C. Scabora; Marcus V.S. Araujo; Paulo H. Oliveir; Bruno Brandoli Machado; Elaine P. M. de Sousa; Caetano Traina; José Fernando Rodrigues

Complex networks are nowadays employed in several applications. Modeling urban street networks is one of them, and in particular to analyze criminal aspects of a city. Several research groups have focused on such application, but until now, there is a lack of a well-defined methodology for employing complex networks in a whole crime analysis process, i.e. from data preparation to a deep analysis of criminal communities. Furthermore, the “toolset” available for those works is not complete enough, also lacking techniques to maintain up-to-date, complete crime datasets and proper assessment measures. In this sense, we propose a threefold methodology for employing complex networks in the detection of highly criminal areas within a city. Our methodology comprises three tasks: (i) Mapping of Urban Crimes; (ii) Criminal Community Identification; and (iii) Crime Analysis. Moreover, it provides a proper set of assessment measures for analyzing intrinsic criminality of communities, especially when considering different crime types. We show our methodology by applying it to a real crime dataset from the city of San Francisco—CA, USA. The results confirm its effectiveness to identify and analyze high criminality areas within a city. Hence, our contributions provide a basis for further developments on complex networks applied to crime analysis.

2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp) | 2017

Agricultural monitoring using clustering techniques on satellite image time series of low spatial resolution

Renata Ribeiro do Valle Gonçalves; Jurandir Zullo; Luciana A. S. Romani; Bruno Ferraz do Amaral; Elaine P. M. de Sousa

This paper discuss how to use the clustering analysis to discover and extract useful information from multi-temporal satellite images with low spatial resolution to improve the agricultural monitoring of sugarcane fields. A large database of satellite images and specific software were used to perform the images pre-processing, time series extraction, clustering method applying and data visualization on several steps throughout the analysis process. The pre-processing phase corresponded to an accurate geometric correction, which is a requirement for applications of time series of satellite images such as the agricultural monitoring. Other steps in the analysis process were accomplished by a graphical interface to improve the interaction with the users. Approach validation was done using NDVI images of sugarcane fields because of their economic importance as source of ethanol and as effective alternative to replace fossil fuels and mitigate greenhouse gases emissions. According to the analysis done, the methodology allowed to identify areas with similar agricultural development patterns, also considering different growing seasons for the crops, covering monthly and annual periods. Results confirm that satellite images of low spatial resolution, such as that from the AVHRR/NOAA sensors, can indeed be satisfactorily used to monitor agricultural crops in regional scale.

Explore More