Jonathan de Andrade Silva
University of São Paulo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan de Andrade Silva.
ACM Computing Surveys | 2013
Jonathan de Andrade Silva; Elaine R. Faria; Rodrigo C. Barros; Eduardo R. Hruschka; André Carlos Ponce Leon Ferreira de Carvalho; João Gama
Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised learning. Nevertheless, data stream clustering imposes several challenges to be addressed, such as dealing with nonstationary, unbounded data that arrive in an online fashion. The intrinsic nature of stream data requires the development of algorithms capable of performing fast and incremental processing of data objects, suitably addressing time and memory limitations. In this article, we present a survey of data stream clustering algorithms, providing a thorough discussion of the main design components of state-of-the-art algorithms. In addition, this work addresses the temporal aspects involved in data stream clustering, and presents an overview of the usually employed experimental methodologies. A number of references are provided that describe applications of data stream clustering in different domains, such as network intrusion detection, sensor networks, and stock market analysis. Information regarding software packages and data repositories are also available for helping researchers and practitioners. Finally, some important issues and open questions that can be subject of future research are discussed.
Pattern Recognition Letters | 2010
Hemerson Pistori; Valguima Victoria Viana Aguiar Odakura; João Bosco Oliveira Monteiro; Wesley Nunes Gonçalves; Antonia Railda Roel; Jonathan de Andrade Silva; Bruno Brandoli Machado
This paper proposes a novel way to combine different observation models in a particle filter framework. This, so called, auto-adjustable observation model, enhance the particle filter accuracy when the tracked objects overlap without infringing a great runtime penalty to the whole tracking system. The approach has been tested under two important real world situations related to animal behavior: mice and larvae tracking. The proposal was compared to some state-of-art approaches and the results show, under the datasets tested, that a good trade-off between accuracy and runtime can be achieved using an auto-adjustable observation model.
iberoamerican congress on pattern recognition | 2010
Wesley Nunes Gonçalves; Jonathan de Andrade Silva; Odemir Martinez Bruno
Face recognition is an important field that has received a lot of attention from computer vision community, with diverse set of applications in industry and science. This paper introduces a novel graph based method for face recognition which is rotation invariant. The main idea of the approach is to model the face image into a graph and use complex network methodology to extract a feature vector. We present the novel methodology and the experiments comparing it with four important and state of art algorithms. The results demonstrated that the proposed method has more positive results than the previous ones.
data and knowledge engineering | 2013
Jonathan de Andrade Silva; Eduardo R. Hruschka
The substitution of missing values, also called imputation, is an important data preparation task for data mining applications. Imputation algorithms have been traditionally compared in terms of the similarity between imputed and original values. However, this traditional approach, sometimes referred to as prediction ability, does not allow inferring the influence of imputed values in the ultimate modeling tasks (e.g., in classification). Based on an extensive experimental work, we study the influence of five nearest-neighbor based imputation algorithms (KNNImpute, SKNN, IKNNImpute, KMI and EACImpute) and two simple algorithms widely used in practice (Mean Imputation and Majority Method) on classification problems. In order to experimentally assess these algorithms, simulations of missing values were performed on six datasets by means of two missingness mechanisms: Missing Completely at Random (MCAR) and Missing at Random (MAR). The latter allows the probabilities of missingness to depend on observed data but not on missing data, whereas the former occurs when the distribution of missingness does not depend on the observed data either. The quality of the imputed values is assessed by two measures: prediction ability and classification bias. Experimental results show that IKNNImpute outperforms the other algorithms in the MCAR mechanism. KNNImpute, SKNN and EACImpute, by their turn, provided the best results in the MAR mechanism. Finally, our experiments also show that best prediction results (in terms of mean squared errors) do not necessarily yield to less classification bias.
international conference on machine learning and applications | 2011
Jonathan de Andrade Silva; Eduardo R. Hruschka
Many algorithms for clustering data streams based on the widely used k-Means have been proposed in the literature. Most of them assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we describe an algorithmic framework that allows estimating k automatically from data. We illustrate the potential of the proposed framework by using three state-of-the-art algorithms for clustering data streams - Stream LSearch, CluStream, and Stream KM++ - combined with two well-known algorithms for estimating the number of clusters, namely: Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). As an additional contribution, we experimentally compare the resulting algorithmic instantiations in both synthetic and real-world data streams. Analyses of statistical significance suggest that OMRk yields to the best data partitions, while BkM is more computationally efficient. Also, the combination of Stream KM++ with OMRk leads to the best trade-off between accuracy and efficiency.
intelligent systems design and applications | 2009
Jonathan de Andrade Silva; Eduardo R. Hruschka
We describe an imputation method (EACImpute) that is based on an evolutionary algorithm for clustering. This method relies on the assumption that clusters of (partially unknown) data can provide useful information for imputation purposes. Experimental results obtained in 5 data sets illustrate different scenarios in which EACImpute performs similarly to widely used imputation methods, thus becoming eligible to join a pool of methods to be used in practical applications. In particular, imputation methods have been traditionally only assessed by some measures of their prediction capability. Although this evaluation is useful, we here also discuss the influence of imputed values in the classification task. Finally, our empirical results suggest that better prediction results do not necessarily imply in less classification bias.
hybrid artificial intelligence systems | 2009
Jonathan de Andrade Silva; Eduardo R. Hruschka
This paper proposes a method for substituting missing values that is based on an evolutionary algorithm for clustering. Missing values substitution has been traditionally assessed by some measures of the prediction capability of imputation methods. Although this evaluation is useful, it does not allow inferring the influence of imputed values in the ultimate modeling task (e.g., in classification). In this sense, alternative approaches to the so called prediction capability evaluation are needed. Therefore, we here also discuss the influence of imputed values in the classification task. Preliminary results obtained in a bioinformatics data set illustrate that the proposed imputation algorithm can insert less classification bias than three state of the art algorithms (i.e., KNNimpute, SKNN and IKNN). Finally, we illustrate that better prediction results do not necessarily imply in less classification bias.
ACM Transactions on Autonomous and Adaptive Systems | 2016
Jonathan de Andrade Silva; Eduardo R. Hruschka
Many algorithms for clustering data streams that are based on the widely used k-Means have been proposed in the literature. Most of these algorithms assume that the number of clusters, k, is known and fixed a priori by the user. Aimed at relaxing this assumption, which is often unrealistic in practical applications, we propose a support system that allows not only estimating the number of clusters automatically from data but also monitoring the process of the data-stream clustering. We illustrate the potential of the proposed system by means of a prototype that implements eight algorithms for clustering data streams, namely, Stream LSearch-OMRk, Stream LSearch-BkM, Stream LSearch-IOMRk, Stream LSearch-IBkM, CluStream-OMRk, CluStream-BkM, StreamKM++-OMRk, and StreamKM++−BkM. These algorithms are combinations of three state-of-the-art algorithms for clustering data streams with fixed k, namely, Stream LSearch, CluStream, and StreamKM++, with two algorithms for estimating the number of clusters, which are Ordered Multiple Runs of k-Means (OMRk) and Bisecting k-Means (BkM). We experimentally compare the performance of these algorithms using both synthetic and real-world data streams. Analyses of statistical significance suggest that the algorithms that are based on OMRk yield the best data partitions, while the algorithms that are based on BkM are more computationally efficient. Additionally, StreamKM++−OMRk and Stream LSearch-IBkM provide the best tradeoff relationship between accuracy and efficiency.
iberoamerican congress on pattern recognition | 2010
Jonathan de Andrade Silva; Wesley Nunes Gonçalves; Bruno Brandoli Machado; Hemerson Pistori; Albert Schiaveto de Souza; Kleber Padovani de Souza
Shape representation provides fundamental features formany applications in computer vision and it is known to be important cues for human vision. This paper presents an experimental study on recognition of mice behavior. We investigate the performance of the four shape recognition methods, namely Chain-Code, Curvature, Fourier descriptors and Zernike moments. These methods are applied to a real database that consists of four mice behaviors. Our experiments show that Zernike moments and Fourier descriptors provide the best results. To evaluate the noise tolerance, we corrupt each contour with different levels of noise. In this scenario, Fourier descriptor shows invariance to high levels of noise.
Expert Systems With Applications | 2017
Jonathan de Andrade Silva; Eduardo R. Hruschka; João Gama