André Gustavo Maletzke

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where André Gustavo Maletzke is active.

Explore More

Publication

Featured researches published by André Gustavo Maletzke.

soft computing | 2014

Time Series Classification with Motifs and Characteristics

André Gustavo Maletzke; Huei Diana Lee; Gustavo Enrique; Almeida Prado Alves Batista; Cláudio Saddy Rodrigues Coy; João José Fagundes; Wu Feng Chung

In the last years, there is a huge increase of interest in application of time series. Virtually all human endeavors create time-oriented data, and the Data Mining community has proposed a large number of approaches to analyze such data. One of the most common tasks in Data Mining is classification, in which each time series should be associated to a class. Empirical evidence has shown that the nearest neighbor rule is very effective to classify time series data. However, the nearest neighbor classifier is unable to provide any form of explanation. In this chapter we describe a novel method to induce classifiers from time series data. Our approach uses standard Machine Learning classifiers using motifs and characteristics as features. We show that our approach can be very effective for classification, providing higher accuracy for most of the data sets used in an empirical evaluation. In addition, when used with symbolic models, such as decision trees, our approach provides very compact decision rules, leveraging knowledge discovery from time series. We also show two case studies with real world medical data.

acm symposium on applied computing | 2018

Unsupervised context switch for classification tasks on data streams with recurrent concepts

Denis Moreira dos Reis; André Gustavo Maletzke; Gustavo E. A. P. A. Batista

In this paper, we propose a novel approach to deal with concept drifts in data streams. We assume we can collect labeled data for different concepts in the training phase; however, in the test phase, no labels are available. Our approach consists of the storage of a limited number of classification models and the unsupervised identification of the most suitable one depending on the current concept. Several real-world classification problems with extreme label latency can use this setting. One example is the identification of insects species using wing-beat data gathered by sensors in field conditions. Flying insects have their wing-beat frequency indirectly affected by temperature, among other factors. In this work, we show that we can dynamically identify which is the most appropriate classification model, among other models from data with different temperature conditions, without any temperature information. We then expand the use of the method to other data sets and obtain accurate results.

brazilian conference on intelligent systems | 2013

Symbolic Representation Based on Temporal Order Information for Time Series Classification

Willian Zalewski; Fabiano Silva; André Gustavo Maletzke; Feng Chung Wu; Huei Diana Lee

In the last decade symbolic representations approaches have been proposed for knowledge discovery in time series. However, the conventional symbolic methods ignore the temporal order of symbols, so this core feature of time series is lost. In this paper, to treat this problem we present a symbolic representation method to incorporate the temporal information in the symbols. The proposed method was evaluated on a decision tree classification using the Symbolic Aggregate Approximation and Equal Fixed Values Discretization approaches applied to 45 time series datasets that includes artificial and real-world data. The experimental results demonstrate the method effectiveness to improve the classification accuracy and the decision tree size for most datasets while preserving the temporal order information into symbolic representations.

Eureka | 2013

Time Series Classification using Motifs and Characteristics Extraction: A Case Study on ECG Databases

André Gustavo Maletzke; Huei D. Lee; Gustavo E. A. P. A. Batista; Solange Oliveira Rezende; Renato Bobsin Machado; Richardson Floriani Voltolini; Joylan Nunes Maciel; Fabiano Silva

In the last decade, the interest for temporal data analysis methods has increased significantly in many application areas. One of these areas is the medical field, in which temporal data is in the core of innumerous diagnosis exams. However, only a small portion of all gathered medical data is properly analyzed, in part, due to the lack of appropriate temporal methods and tools. This work presents an alternative approach, based on global characteristics and motifs, to mine medical time series databases using machine learning algorithms. Characteristics are data statistics that present a global summary of the data. Motifs are frequently recurrent subsequences that usually represent interesting local patterns. We use a combination of global characteristics and local motifs to describe the data and feed machine learning algorithms. A case study is performed on three databases of Electrocardiogram exams. Our results show the superior performance of our approach in comparison to the naive method that provides raw temporal data directly to the learning algorithms. We demonstrate that our approach is more accurate and provides more interpretable models than the method that does not extract features.

knowledge discovery and data mining | 2018

Classifying and Counting with Recurrent Contexts

Denis Moreira dos Reis; André Gustavo Maletzke; Diego Furtado Silva; Gustavo E. A. P. A. Batista

Many real-world applications in the batch and data stream settings with data shift pose restrictions to the access to class labels after the deployment of a classification or quantification model. However, a significant portion of the data stream literature assumes that actual labels are instantaneously available after issuing their corresponding classifications. In this paper, we explore a different set of assumptions without relying on the availability of class labels. We assume that, although the distribution of the data may change over time, it will switch between one of a handful of well-known distributions. Still, we allow the proportions of the classes to vary. In these conditions, we propose the first method that can accurately identify the correct context of data samples and simultaneously estimate the proportion of the positive class. This estimate can be further used to adjust a classification decision threshold and improve classification accuracy. Finally, the method is very efficient regarding time and memory requirements, fitting data stream applications.

Journal of the Brazilian Computer Society | 2018

Combining instance selection and self-training to improve data stream quantification

André Gustavo Maletzke; Denis Moreira dos Reis; Gustavo E. A. P. A. Batista

In the last years, learning from data streams has attracted the attention of researchers and practitioners due to its large number of applications. These applications have motivated the research community to propose a significant amount of methods to solve problems in diverse tasks, more prominently in classification, clustering, and anomaly detection. However, a relevant task known as quantification has remained mostly unexplored. The quantification goal is to provide an estimate of the class prevalence in an unlabeled set. Recently, we proposed the SQSI algorithm to quantify data streams with concept drifts. SQSI uses a statistical test to identify concept drifts and retrain the classifiers. However, the retraining involves requiring the labels for all newly arrived instances. In this paper, we extend SQSI algorithm by exploring instance selection techniques allied to semi-supervised learning. The idea is to request the classes of a smaller subset of recent examples. Our experiments demonstrate that SQSI’s extension significantly reduces the dependency on actual labels while maintaining or improving the quantification accuracy.

IEEE Latin America Transactions | 2017

FB-DT: An improvement in the Brute Force algorithm for motifs discovery

Lucas Guilherme Hubner; André Gustavo Maletzke; Barbara Lepretti de Nadai; Ricardo Luís Schaefer; Willian Zalewski; Carlos Andres Ferrero

Nowadays, the interest for time series analysis using motifs extraction has been expanded to different areas. However, due to the complexity and dimensionality of the time series datasets, this task may become restrictive in certain cases. Thus, several methods have been proposed, which use the Brute Force algorithm as a baseline criterion. In this work, we propose an improvement in the Brute Force algorithm aimed to reduce the execution time and, consequently, allow its use in a larger number of situations. Experimental results show a significant reduction in the execution time of brute force algorithm.

Eureka | 2013

Method and System for Real-Time Audio and Video Transmission of Colonoscopy Exams: A Study Case on a Local Network

Renato Bobsin Machado; Huei Diana Lee; Joylan Nunes Maciel; Richardson Floriani Voltolini; André Gustavo Maletzke; Cláudio Saddy Rodrigues Coy; João José Fagundes; Feng Chung Wu

Telemedicine can facilitate the examination and diagnosis of patients in locations with lack of resources and medical experts. This paper presents an innovative method and a computer system that allows real-time text, voice and video interaction among participants and data sharing of colonoscopy exams over the Internet. The proposed method implements a medical database which will be further explored using data mining methods. The functionalities and performance of the method were evaluated, in a local network, with the development of a computational system. The results validated the solution showing its applicability in colonoscopy exams.

brazilian symposium on bioinformatics | 2008

Evaluation of Models for the Recognition of Hadwritten Digits in Medical Forms

Willian Zalewski; Huei Diana Lee; Adewole M. J. F. Caetano; Ana Carolina Lorena; André Gustavo Maletzke; João José Fagundes; Cláudio Saddy; Rodrigues Coy; Feng Chung Wu

Medicine has benefited widely from the use of computational techniques, which are often employed in the analysis of data generated in medical clinics. Among the computational techniques used in these analyses are those from Knowledge Discovery in Databases (KDD). In order to apply KDD techniques in the analysis of clinical data, it is often necessary to map them into an adequate structured format. This paper presents an extension in a methodology to map medical forms into structured datasets, in which a sub-system for handwritten digit recognition is added to the overall mapping system.

brazilian conference on intelligent systems | 2017