Vinícius Mourão Alves de Souza

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Vinícius Mourão Alves de Souza is active.

Explore More

Publication

Featured researches published by Vinícius Mourão Alves de Souza.

Data Mining and Knowledge Discovery | 2014

CID: an efficient complexity-invariant distance for time series

Gustavo E. A. P. A. Batista; Eamonn J. Keogh; Oben M. Tataw; Vinícius Mourão Alves de Souza

The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While dozens of classification algorithms have been applied to time series, recent empirical evidence strongly suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm is important, and depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping, and cardiology data requires invariance to the baseline (the mean value). Similarly, recent work suggests that for time series clustering, the choice of clustering algorithm is much less important than the choice of distance measure used.In this work we make a somewhat surprising claim. There is an invariance that the community seems to have missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where some complex objects may be incorrectly assigned to a simpler class. Similarly, for clustering this effect can introduce errors by “suggesting” to the clustering algorithm that subjectively similar, but complex objects belong in a sparser and larger diameter cluster than is truly warranted.We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification and clustering accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series mining experiments ever attempted in a single work, and show that complexity-invariant distance measures can produce improvements in classification and clustering in the vast majority of cases.

siam international conference on data mining | 2015

Data stream classification guided by clustering on nonstationary environments and extreme verification latency

Vinícius Mourão Alves de Souza; Diego Furtado Silva; João Gama; Gustavo E. A. P. A. Batista

Sao Paulo Research Foundation (FAPESP) (grant numbers 2011/17698-5, 2012/50714-7, 2013/26151-5)

international conference on data mining | 2013

Time Series Classification Using Compression Distance of Recurrence Plots

Diego Furtado Silva; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista

There is a huge increase of interest for time series methods and techniques. Virtually every piece of information collected from human, natural, and biological processes is susceptible to changes over time, and the study of how these changes occur is a central issue in fully understanding such processes. Among all time series mining tasks, classification is likely to be the most prominent one. In time series classification there is a significant body of empirical research that indicates that k-nearest neighbor rule in the time domain is very effective. However, certain time series features are not easily identified in this domain and a change in representation may reveal some significant and unknown features. In this work, we propose the use of recurrence plots as representation domain for time series classification. Our approach measures the similarity between recurrence plots using Campana-Keogh (CK-1) distance, a Kolmogorov complexity-based distance that uses video compression algorithms to estimate image similarity. We show that recurrence plots allied to CK-1 distance lead to significant improvements in accuracy rates compared to Euclidean distance and Dynamic Time Warping in several data sets. Although recurrence plots cannot provide the best accuracy rates for all data sets, we demonstrate that we can predict ahead of time that our method will outperform the time representation with Euclidean and Dynamic Time Warping distances.

Journal of Intelligent and Robotic Systems | 2015

Exploring Low Cost Laser Sensors to Identify Flying Insect Species

Diego Furtado Silva; Vinícius Mourão Alves de Souza; Daniel P. W. Ellis; Eamonn J. Keogh; Gustavo E. A. P. A. Batista

Insects have a close relationship with the humanity, in both positive and negative ways. Mosquito borne diseases kill millions of people and insect pests consume and destroy around US

international conference on machine learning and applications | 2013

Applying Machine Learning and Audio Analysis Techniques to Insect Recognition in Intelligent Traps

Diego Furtado Silva; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista; Eamonn J. Keogh; Daniel P. W. Ellis

40 billion worth of food each year. In contrast, insects pollinate at least two-thirds of all the food consumed in the world. In order to control populations of disease vectors and agricultural pests, researchers in entomology have developed numerous methods including chemical, biological and mechanical approaches. However, without the knowledge of the exact location of the insects, the use of these techniques becomes costly and inefficient. We are developing a novel sensor as a tool to control disease vectors and agricultural pests. This sensor, which is built from inexpensive commodity electronics, captures insect flight information using laser light and classifies the insects according to their species. The use of machine learning techniques allows the sensor to automatically identify the species without human intervention. Finally, the sensor can provide real-time estimates of insect species with virtually no time gap between the insect identification and the delivery of population estimates. In this paper, we present our solution to the most important challenge to make this sensor practical: the creation of an accurate classification system. We show that, with the correct combination of feature extraction and machine learning techniques, we can achieve an accuracy of almost 90 % in the task of identifying the correct insect species among nine species. Specifically, we show that we can achieve an accuracy of 95 % in the task of correctly recognizing if a given event was generated by a disease vector mosquito.

international conference on pattern recognition | 2014

Extracting Texture Features for Time Series Classification

Vinícius Mourão Alves de Souza; Diego Furtado Silva; Gustavo E. A. P. A. Batista

Throughout the history, insects have had an intimate relationship with humanity, both positive and negative. Insects are vectors of diseases that kill millions of people every year and, at the same time, insects pollinate most of the worlds food production. Consequently, there is a demand for new devices able to control the populations of harmful insects while having a minimal impact on beneficial insects. In this paper, we present an intelligent trap that uses a laser sensor to selectively classify and catch insects. We perform an extensive evaluation of different feature sets from audio analysis and machine learning algorithms to construct accurate classifiers for the insect classification task. Support Vector Machines achieved the best results with a MFCC feature set, which consists of coefficients from frequencies scaled according to the human auditory system. We evaluate our classifiers in multiclass and binary class settings, and show that a binary class classifier that recognizes the mosquito species achieved almost perfect accuracy, assuring the applicability of the proposed intelligent trap.

ibero-american conference on artificial intelligence | 2012

Spoken Digit Recognition in Portuguese Using Line Spectral Frequencies

Diego Furtado Silva; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista; Rafael Giusti

Time series are present in many pattern recognition applications related to medicine, biology, astronomy, economy, and others. In particular, the classification task has attracted much attention from a large number of researchers. In such a task, empirical researches has shown that the 1-Nearest Neighbor rule with a distance measure in time domain usually performs well in a variety of application domains. However, certain time series features are not evident in time domain. A classical example is the classification of sound, in which representative features are usually present in the frequency domain. For these applications, an alternative representation is necessary. In this work we investigate the use of recurrence plots as data representation for time series classification. This representation has well-defined visual texture patterns and their graphical nature exposes hidden patterns and structural changes in data. Therefore, we propose a method capable of extracting texture features from this graphical representation, and use those features to classify time series data. We use traditional methods such as Grey Level Co-occurrence Matrix and Local Binary Patterns, which have shown good results in texture classification. In a comprehensible experimental evaluation, we show that our method outperforms the state-of-the-art methods for time series classification.

international conference on pattern recognition | 2014

Time Series Transductive Classification on Imbalanced Data Sets: An Experimental Study

Celso André R. de Sousa; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista

Recognition of isolated spoken digits is the core procedure for a large and important number of applications mainly in telephone based services, such as dialing, airline reservation, bank transaction and price quotation, only using speech. Spoken digit recognition is generally a challenging task since the signals last for short period of time and often some digits are acoustically very similar to each other. The objective of this paper is to investigate the use of machine learning algorithms for digit recognition. We focus on the recognition of digits spoken in Portuguese. However, we note that our techniques are applicable to any language. We believe that the most important task for successfully recognizing spoken digits is the attribute extraction. Audio data is composed by a huge amount of very weak features, and most machine learning algorithms will not be able to build accurate classifiers. We show that Line Spectral Frequencies (LSF) provides a set of highly predictive coefficients for digit recognition. The results are superior than those obtained with state-of-the-art methods using Mel-Frequency Cepstrum Coefficients (MFCC) for digit recognition. In particular, we show that the choice of the right attribute extraction method is more important than the specific classification paradigm, and that the right combination of classifier and attributes can provide almost perfect accuracy.

international symposium on neural networks | 2015

An experimental analysis on time series transductive classification on graphs

Celso André R. de Sousa; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista

Graph-based semi-supervised learning (SSL) algorithms perform well on a variety of domains, such as digit recognition and text classification, when the data lie on a low-dimensional manifold. However, it is surprising that these methods have not been effectively applied on time series classification tasks. In this paper, we provide a comprehensive empirical comparison of state-of-the-art graph-based SSL algorithms with respect to graph construction and parameter selection. Specifically, we focus in this paper on the problem of time series transductive classification on imbalanced data sets. Through a comprehensive analysis using recently proposed empirical evaluation models, we confirm some of the hypotheses raised on previous work and show that some of them may not hold in the time series domain. From our results, we suggest the use of the Gaussian Fields and Harmonic Functions algorithm with the mutual k-nearest neighbors graph weighted by the RBF kernel, setting k = 20 on general tasks of time series transductive classification on imbalanced data sets.

international symposium on neural networks | 2015

Effective insect recognition using a stacked autoencoder with maximum correntropy criterion

Yu Qi; Goktug T. Cinar; Vinícius Mourão Alves de Souza; Gustavo E. A. P. A. Batista; Yueming Wang; Jose C. Principe

Graph-based semi-supervised learning (SSL) algorithms perform well when the data lie on a low-dimensional manifold. Although these methods achieved satisfactory performance on a variety of domains, they have not been effectively evaluated on time series classification. In this paper, we provide a comprehensive empirical comparison of state-of-the-art graph-based SSL algorithms combined with a variety of graph construction methods in order to evaluate them on time series transductive classification tasks. Through a detailed experimental analysis using recently proposed empirical evaluation models, we show strong and weak points of these classifiers concerning both performance and stability with respect to graph construction and parameter selection. Our results show that some hypotheses raised on previous work do not hold in the time series domain while others may only hold under mild conditions.

Explore More