Is this you? Create Your Porfile

Daniel de Araújo

Federal University of Rio Grande do Norte

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel de Araújo is active.

Explore More

Publication

Featured researches published by Daniel de Araújo.

Neurocomputing | 2009

Multi-objective clustering ensemble for gene expression data analysis

Katti Faceli; Marcílio Carlos Pereira de Souto; Daniel de Araújo; André Carlos Ponce Leon Ferreira de Carvalho

In this paper, we present an algorithm for cluster analysis that integrates aspects from cluster ensemble and multi-objective clustering. The algorithm is based on a Pareto-based multi-objective genetic algorithm, with a special crossover operator, which uses clustering validation measures as objective functions. The algorithm proposed can deal with data sets presenting different types of clusters, without the need of expertise in cluster analysis. Its result is a concise set of partitions representing alternative trade-offs among the objective functions. We compare the results obtained with our algorithm, in the context of gene expression data sets, to those achieved with multi-objective clustering with automatic K-determination (MOCK), the algorithm most closely related to ours.

international symposium on neural networks | 2011

Comparative study on dimension reduction techniques for cluster analysis of microarray data

Daniel de Araújo; Adrião Duarte Dória Neto; Allan de Medeiros Martins; Jorge Dantas de Melo

This paper proposes a study on the impact of the use of dimension reduction techniques (DRTs) in the quality of partitions produced by cluster analysis of microarray datasets. We tested seven DRTs applied to four microarray cancer datasets and ran four clustering algorithms using the original and reduced datasets. Overall results showed that using DRTs provides a improvement in performance of all algorithms tested, specially in the hierarchical class. We could see that, despite Principal Component Analysis (PCA) being the most widely used DRT, its was overcome by other nonlinear methods and it did not provide a substantial performance increase in the clustering algorithms. On the other hand, t-distributed Stochastic Embedding (t-SNE) and Laplacian Eigenmaps (LE) achieved good results for all datasets.

Expert Systems With Applications | 2013

Information-theoretic clustering: A representative and evolutionary approach

Daniel de Araújo; Adrião Duarte Dória Neto; Allan de Medeiros Martins

This paper proposes a new perspective on non-parametric entropy-based clustering. We developed a new cost evaluation function for clustering that measures the cross information potential (CIP) between clusters on a dataset using representative points, which we called representative CIP (rCIP). We did this based on the idea that optimizing the cross information potential is equivalent to minimizing cross entropy between clusters. Our measure is different because, instead of using all points in a dataset, it uses only representative points to quantify the interaction between distributions without any loss of the original properties of cross information potential. This brings a double advantage: decreases the computational cost of computing the cross information potential, thus drastically reducing the running time, and uses the underlying statistics of the space region where representative points are in order to measure interaction. With this, created a useful non-parametric estimator of entropy and makes possible using cross information potential in applications where it was not. Due to the nature of clustering problems, we proposed a genetic algorithm in order to use rCIP as cost function. We ran several tests and compared the results with single linkage hierarchical algorithm, finite mixture of Gaussians and spectral clustering in both synthetic and real image segmentation datasets. Experiments showed that our approach achieved better results compared to the other algorithms and it was capable of capture the real structure of the data in most cases regardless of its complexity. It also produced good image segmentation with the advantage of a tuning parameter that provides a way of refining segmentation.

international conference on artificial neural networks | 2012

Comparative study on information theoretic clustering and classical clustering algorithms

Daniel de Araújo; Adrião Duarte Dória Neto; Allan de Medeiros Martins

This paper proposes a comparative empirical study on algorithms for clustering. We tested the method proposed in [2] using distinct synthetic and real (gene expression) datasets. We chose synthetic datasets with different spatial complex to verify the applicability of the algorithm. We also evaluated the IT algorithm in real-life problems by using microarray gene expression datasets. Compared with simple but still spread used classical algorithms k-means, hierarchical clustering and finite mixture of Gaussians, the IT algorithm showed to be more robust for both proposed scenarios.

international conference on artificial neural networks | 2017

A Feature Selection Approach Based on Information Theory for Classification Tasks

Jhoseph Jesus; Anne M. P. Canuto; Daniel de Araújo

This paper proposes the use of a Information Theory measure in a dynamic feature selection approach. We tested such approach including elements of Information Theory in the process, such as Mutual Information, and compared with classical methods like PCA and LDA as well as Mutual Information based algorithms. Results showed that the proposed method achieved better performance in most cases when compared with the other methods. Based on this, we could conclude that the proposed approach is very promising since it achieved better performance than well-established dimensionality reduction methods.

international conference on artificial neural networks | 2016

A Combination Method for Reducing Dimensionality in Large Datasets

Daniel de Araújo; Jhoseph Jesus; Adrião Duarte Dória Neto; Allan de Medeiros Martins

The amount of data in the world is growing exponentially due to the elevated number of applications in the most various contexts. This data needs to be analyzed in order to extract valuable underlying information from them. Machine learning is a useful tool to do this task, but the high complexity of the data forces to use other methods to reduce such complexity. Dimensionality reduction (feature selection) is one of the most used method to achieve this goal. As usual, many algorithms were proposed to reduce dimension of data, each one with its own advantages and drawbacks. The variety of algorithms usually makes researches to test several methods and choose the best solution. Based on that, this paper proposes a combination of feature selection algorithms in order to create a single and more stable solution. We tested this approach using real datasets and machine learning algorithms. Results showed we can use the combined solution with little or none loss in classification accuracy. So, our method can be used as a stable choice when there is few knowledge about the problem.

ieee international smart cities conference | 2016

Social smart city: A platform to analyze social streams in smart city initiatives

Arthur Souza; Mickael Figueredo; Nélio Cacho; Daniel de Araújo; Jazon Coelho; Carlos Prolo

A central issue in the context of smart cities is how to analyze a large amount of data generated by different kinds of sources in real time. This paper reports a case study in real-time acquisition of crime detection information from social media messages, built on top of a plataform for fast processing and visualization of data from Twitter. The purpose is to allow city managers to act timely on preventing crime occurence as detected from tweets posted by real users. Key issues here are the processing of a large volume of data and modularization and customization capabilities implemented through pipelined modules for robust, fast, real time tweet acquisition and storage. In particular, the customization is reflected by modules of filtering of several kinds, natural language processing tasks, topped by a machime learning analysis that allows for the classification of the input messages according to the local policy category system.

brazilian conference on intelligent systems | 2016

Fusion Approaches of Feature Selection Algorithms for Classification Problems

Jhoseph Jesus; Daniel de Araújo; Anne M. P. Canuto

The large amount of data produced by applications in recent years needs to be analyzed in order to extract valuable underlying information from them. Machine learning algorithms are useful tools to perform this task, but usually it is necessary to reduce complexity of data using feature selection algorithms. As usual, many algorithms were proposed to reduce dimension of data, each one with its own advantages and drawbacks. The variety of algorithms leads to either choose one algorithm or to combine several methods. The last option usually brings better performance. Based on this, this paper proposes an analysis of two distinct approaches of combining feature selection algorithms (decision and data fusion). This analysis was made in supervised classification context using real and synthetic datasets. Results showed that one proposed approach (decision fusion) has achieved the best results for the majority of datasets.

Pattern Recognition Letters | 2013

Representative cross information potential clustering

Daniel de Araújo; Adrião Duarte Dória Neto; Allan de Medeiros Martins

This paper proposes an information-theoretic approach for clustering with a new measure of cross information potential and two clustering algorithms. Instead of using all points of the dataset, the proposed measure uses representative points to quantify the interaction between distributions without any loss of the original properties of cross information potential. This brings a double advantage. It decreases the cost of computing the cross information potential, thus drastically reducing the running time. Secondly, it captures the interaction among the data points by utilizing the underlying statistics of the space region centered around the representative points. With this, we have made it possible to use cross information potential in applications where it was not. We also proposed two algorithms for clustering which explore the idea of creating links between regions of the feature space that are highly correlated. We ran several tests and compared the results with single linkage hierarchical algorithm, finite mixture of Gaussians and spectral clustering in both synthetic and real image segmentation datasets. Experiments showed that our approach achieved better results compared to the other algorithms and it was capable of capture the real structure of the data in most cases regardless of its complexity. It also produced good image segmentation with the advantage of a tuning parameter that provides a way of refine segmentation.

australasian joint conference on artificial intelligence | 2005

Individual clustering and homogeneous cluster ensemble approaches applied to gene expression data

Shirlly C. M. Silva; Daniel de Araújo; Raul B. Paradeda; Valmar S. Severiano-Sobrinho; Marcílio Carlos Pereira de Souto

Exploratory data analysis and, in particular, data clustering can significantly benefit from combining multiple data partitions – cluster ensemble. In this context, we analyze the potential of applying cluster ensemble techniques to gene expression microarray data. Our experimental results show that there is often a significant improvement in the results obtained with the use of ensemble techniques when compared to those based on the clustering techniques used individually.

Explore More