Fabrício Olivetti de França

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabrício Olivetti de França is active.

Explore More

Publication

Featured researches published by Fabrício Olivetti de França.

congress on evolutionary computation | 2013

Extending features for multilabel classification with swarm biclustering

Ronaldo C. Prati; Fabrício Olivetti de França

In some data mining applications the analyzed data can be classified as simultaneously belonging to more than one class, this characterizes the multi-label classification problem. Numerous methods for dealing with this problem are based on decomposition, which essentially treats labels (or some subsets of labels) independently and ignores interactions between them. This fact might be a problem, as some labels may be correlated to local patterns in the data. In this paper, we propose to enhance multi-label classifiers with the aid of biclusters, which are capable of finding the correlation between subsets of objects, features and labels. We then construct binary features from these patterns that can be interpreted as local correlations (in terms of subset of features and instances) in the data. These features are used as input for multi-label classifiers. We experimentally show that using such constructed features can improve the classification performance of some decompositive multi-label learning techniques.

international conference on machine learning and applications | 2012

Scalable Overlapping Co-clustering of Word-Document Data

Fabrício Olivetti de França

Text clustering is used on a variety of applications such as content-based recommendation, categorization, summarization, information retrieval and automatic topic extraction. Since most pair of documents usually shares just a small percentage of words, the dataset representation tends to become very sparse, thus the need of using a similarity metric capable of a partial matching of a set of features. The technique known as Co-Clustering is capable of finding several clusters inside a dataset with each cluster composed of just a subset of the object and feature sets. In word-document data this can be useful to identify the clusters of documents pertaining to the same topic, even though they share just a small fraction of words. In this paper a scalable co-clustering algorithm is proposed using the Locality-sensitive hashing technique in order to find co-clusters of documents. The proposed algorithm will be tested against other co-clustering and traditional algorithms in well known datasets. The results show that this algorithm is capable of finding clusters more accurately than other approaches while maintaining a linear complexity.

Expert Systems With Applications | 2016

A hash-based co-clustering algorithm for categorical data

Fabrício Olivetti de França

The proposal of a new Co-Clustering approach for categorical data.The proposed algorithm is scale linearly with the data size.The results show the quality of found clusters and a diverse set of applications for such approach. Cluster analysis, or clustering, refers to the analysis of the structural organization of a data set. This analysis is performed by grouping together objects of the data that are more similar among themselves than to objects of different groups. The sampled data may be described by numerical features or by a symbolic representation, known as categorical features. These features often require a transformation into numerical data in order to be properly handled by clustering algorithms. The transformation usually assigns a weight for each feature calculated by a measure of importance (i.e., frequency, mutual information). A problem with the weight assignment is that the values are calculated with respect to the whole set of objects and features. This may pose as a problem when a subset of the features have a higher degree of importance to a subset of objects but a lower degree with another subset. One way to deal with such problem is to measure the importance of each subset of features only with respect to a subset of objects. This is known as co-clustering that, similarly to clustering, is the task of finding a subset of objects and features that presents a higher similarity among themselves than to other subsets of objects and features. As one might notice, this task has a higher complexity than the traditional clustering and, if not properly dealt with, may present an scalability issue. In this paper we propose a novel co-clustering technique, called HBLCoClust, with the objective of extracting a set of co-clusters from a categorical data set, without the guarantees of an enumerative algorithm, but with the compromise of scalability. This is done by using a probabilistic clustering algorithm, named Locality Sensitive Hashing, together with the enumerative algorithm named InClose. The experimental results are competitive when applied to labeled categorical data sets and text corpora. Additionally, it is shown that the extracted co-clusters can be of practical use to expert systems such as Recommender Systems and Topic Extraction.

congress on evolutionary computation | 2015

Maximization of a dissimilarity measure for multimodal optimization

Fabrício Olivetti de França

Many practical problems are described by an objective-function with the intent to optimize a single goal. This leads to the important research topic of nonlinear optimization, that seeks to create algorithms and computational methods that are capable of finding a global optimum of such functions. But, many functions are multimodal, having many different global optima. Also, given the impossibility to create an exact model of a real-world problem, not every global (or local) optima is feaseable to be conceived. As such, it is interesting to find as many alternative optima in order to find one that is feaseable given unmodelled constraints. This paper proposes a methodology that, given a local optimum, it finds nearby local optima with similar objective-function values. This is performed by maximizing the approximation error of a Linear Interpolation of the function. The experiments show promising results regarding the number of detected peaks when compared to the state-of-the-art, though requiring a higher number of function evaluations on average.

Expert Systems With Applications | 2015

A biclustering approach for classification with mislabeled data

Fabrício Olivetti de França; André L. V. Coelho

We propose a biclustering approach (BicNoise) for coping with mislabeled data.We assess three variants of BicNoise on binary classification problems.The proposed strategy is mostly useful when high levels of noise take place. Labeling samples on large data sets is a demanding task prone to different sources of errors. Those errors, denoted as noise, can significantly impact the performance of a classification algorithm due to overfitting of wrongly labeled data. So far, this problem has been treated by avoiding the overfitting and correcting mislabeled data through similarity analysis. The former approach can be affected by the curse of dimensionality and some mislabeled data will not be corrected. In this paper, we investigate the use of a biclustering approach to capture local models of coherence across subsets of instances and attributes. Those models are used to replace and augment the attributes of the original dataset. Through a systematic series of experiments, we have assessed the performance of the proposed approach, referred to as BicNoise, by considering different rates and types of label noise, and also different types of classifiers, binary datasets, and evaluation metrics. The good results achieved suggest that the transformed data can alleviate the dimensionality problem, reduce the redundancy of correlated features and improve the separability of the data, thus improving the classifier performance (most noticeably, in the highest noise settings).

ChemBioChem | 2015

Um Ambiente de Jogo Eletrônico para Avaliar Algoritmos Coevolutivos

Karine da Silva Miras de Araujo; Fabrício Olivetti de França

Resumo—Uma das aplicações de inteligência artificial em jogos eletrônicos consiste em fazer um agente artificial aprender a executar uma determinada tarefa com sucesso. Para isso fazse necessário o uso de um algoritmo que possa aprender a determinar as sequências de ações de acordo com o ambiente observado. Para esse fim existem diversas técnicas de aprendizado supervisionado que permitem aprender a resposta correta através de exemplos. Porém, no caso de jogos eletrônicos a resposta correta pode ser apenas mensurada após a completude da sequência de ações, sendo então impossı́vel determinar a resposta correta de cada instante de tempo. Uma forma de circunvir esse problema é através da Neuroevolução, que treina uma Rede Neural Artificial através de algoritmos evolutivos de tal forma que ela tenha sucesso no resultado final de suas ações. Nesse artigo, introduzimos um novo ambiente de referência para testar algoritmos de aprendizado de agentes autônomos em jogos eletrônicos, chamado EvoMan, inspirado no jogo de plataforma Mega Man II. Essa plataforma compreende situações de: aprendizado em ambiente estático e dinâmico, aprendizado contı́nuo de coevolução e aprendizado generalizado. Como experimentos iniciais, aplicamos uma Neuroevolução utilizando Algoritmos Genéticos e o algoritmo NEAT no contexto de coevolução para demonstrar os desafios da plataforma proposta.

Information Sciences | 2018

A greedy search tree heuristic for symbolic regression

Fabrício Olivetti de França

Abstract Symbolic Regression tries to find a mathematical expression that describes the relationship of a set of explanatory variables to a measured variable. The main objective is to find a model that minimizes the error and, optionally, that also minimizes the expression size. A smaller expression can be seen as an interpretable model considered a reliable decision model. This is often performed with Genetic Programming, which represents their solution as expression trees. The shortcoming of this algorithm lies on this representation that defines a rugged search space and contains expressions of any size and difficulty. These pose as a challenge to find the optimal solution under computational constraints. This paper introduces a new data structure, called Interaction-Transformation (IT), that constrains the search space in order to exclude a region of larger and more complicated expressions. In order to test this data structure, it was also introduced an heuristic called SymTree. The obtained results show evidence that SymTree are capable of obtaining the optimal solution whenever the target function is within the search space of the IT data structure and competitive results when it is not. Overall, the algorithm found a good compromise between accuracy and simplicity for all the generated models.

CompleNet | 2015

A Flexible Fitness Function for Community Detection in Complex Networks

Fabrício Olivetti de França; Guilherme Palermo Coelho

Most community detection algorithms from the literature work as optimization tools that minimize a given quality (or fitness) function, while assuming that each node belongs to a single community. Although several studies propose fitness functions for the detection of communities, the definition of what a community is is still vague. Therefore, each proposal of fitness function leads to communities that reflect the particular definition of community adopted by the authors. Besides, such communities not always correspond to the real partition observed in practice. This paper proposes a new flexible fitness function for community detection that allows the user to obtain communities that reflect distinct characteristics according to what is needed. This new fitness function was combined with an adapted version of the immune-inspired optimization algorithm named cob-aiNet[C] and applied to identify (both disjoint and overlapping) communities in a set of artificial and real-world complex networks. The results have shown that the partitions obtained with the optimization of this new metric are more coherent (when compared to the real, known, partitions) than those obtained with one of the most adopted function from the literature: modularity.

congress on evolutionary computation | 2013

Identifying overlapping communities in complex networks with multimodal optimization

Fabrício Olivetti de França; Guilherme Palermo Coelho

The analysis of complex networks is an important research topic that helps us understand the underlying behavior of complex systems and the interactions of their components. One particularly relevant analysis is the detection of communities formed by such interactions. Most community detection algorithms work as optimization tools that minimize a given quality function, while assuming that each node belongs to a single community. However, most complex networks contain nodes that belong to two or more communities, which are called bridges. The identification of bridges is crucial to several problems, as they often play important roles in the system described by the network. By exploiting the multimodality of quality functions, it is possible to obtain distinct optimal communities where, in each solution, each bridge node belongs to a distinct community. This paper proposes a technique that tries to identify a set of (possibly) overlapping communities by combining diverse solutions contained in a pool, which correspond to disjoint community partitions of a given network. To obtain the pool of partitions, an adapted version of the immune-inspired algorithm named cob-aiNet[C] was adopted here. The proposed methodology was applied to four real-world social networks and the obtained results were compared to those reported in the literature. The comparisons have shown that the proposed approach is competitive and even capable of overcoming the best results reported for some of the problems.

Social Network Analysis and Mining | 2018

User profiling of the Twitter Social Network during the impeachment of Brazilian President

Fabrício Olivetti de França; Denise Hideko Goya; Claudio Luis de Camargo Penteado

The impeachment process that took place in Brazil in April, 2016, has generated a large amount of posts on the Social Networks. These posts came from ordinary people, journalists, traditional and independent media, politicians and supporters. The identification of the impact of this subject on each group of users can be an important analysis to verify the real interest of common Brazilian citizens on this matter. As such, we propose a way to segment the users into popular, activists and observers in order to filter out information and help us give a more detailed analysis of the event. The proposed segmentation may also help other studies related to the usage of Twitter during important events.

Explore More