Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Correa Martins is active.

Publication


Featured researches published by David Correa Martins.


BMC Bioinformatics | 2008

Feature selection environment for genomic applications

Fabrício Martins Lopes; David Correa Martins; Roberto M. Cesar

BackgroundFeature selection is a pattern recognition approach to choose important variables according to some criteria in order to distinguish or explain certain phenomena (i.e., for dimensionality reduction). There are many genomic and proteomic applications that rely on feature selection to answer questions such as selecting signature genes which are informative about some biological state, e.g., normal tissues and several types of cancer; or inferring a prediction network among elements such as genes, proteins and external stimuli. In these applications, a recurrent problem is the lack of samples to perform an adequate estimate of the joint probabilities between element states. A myriad of feature selection algorithms and criterion functions have been proposed, although it is difficult to point the best solution for each application.ResultsThe intent of this work is to provide an open-source multiplataform graphical environment for bioinformatics problems, which supports many feature selection algorithms, criterion functions and graphic visualization tools such as scatterplots, parallel coordinates and graphs. A feature selection approach for growing genetic networks from seed genes (targets or predictors) is also implemented in the system.ConclusionThe proposed feature selection environment allows data analysis using several algorithms, criterion functions and graphic visualization tools. Our experiments have shown the software effectiveness in two distinct types of biological problems. Besides, the environment can be used in different pattern recognition applications, although the main concern regards bioinformatics tasks.


Archive | 2007

Constructing Probabilistic Genetic Networks of Plasmodium falciparum from Dynamical Expression Signals of the Intraerythrocytic Development Cycle

Junior Barrera; Roberto M. Cesar; David Correa Martins; Ricardo Z. N. Vêncio; Emilio F. Merino; Marcio Yamamoto; Florencia Leonardi; Carlos Alberto Pereira; Hernando A. del Portillo

The completion of the genome sequence of Plasmodium falciparum revealed that close to 60% of the annotated genome corresponds to hypothetical proteins and that many genes, whose metabolic pathways or biological products are known, have not been predicted from sequence similarity searches. Recently, using global gene expression of the asexual blood stages of P. falciparum at 1 h resolution scale and Discrete Fourier Transform based techniques, it has been demonstrated that many genes are regulated in a single periodic manner during the asexual blood stages. Moreover, by ordering the genes according to the phase of expression, a new list of targets for vaccine and drug development was generated. In the present paper, genes are annotated under a different perspective: a list of functional properties is attributed to networks of genes representing subsystems of the P. falciparum regulatory expression system. The model developed to represent genetic networks, called Probabilistic Genetic Network (PGN), is a Markov chain with some additional properties. This model mimics the properties of a gene as a non-linear stochastic gate and the systems are built by coupling of these gates. Moreover, a tool that integrates mining of dynamical expression signals by PGN design techniques, different databases and biological knowledge, was developed. The applicability of this tool for discovering gene networks of the malaria expression regulation system has been validated using the glycolytic pathway as a “gold-standard”, as well as by creating an apicoplast PGN network. Presently, we are tentatively improving the network design technique before trying to validate results from the apicoplast PGN network through reverse genetics approaches.


IEEE Journal of Selected Topics in Signal Processing | 2008

Intrinsically Multivariate Predictive Genes

David Correa Martins; Ulisses Braga-Neto; Ronaldo Fumio Hashimoto; Michael L. Bittner; Edward R. Dougherty

Canalizing genes possess such broad regulatory power, and their action sweeps across a such a wide swath of processes that the full set of affected genes are not highly correlated under normal conditions. When not active, the controlling gene will not be predictable to any significant degree by its subject genes, either alone or in groups, since their behavior will be highly varied relative to the inactive controlling gene. When the controlling gene is active, its behavior is not well predicted by any one of its targets, but can be very well predicted by groups of genes under its control. To investigate this question, we introduce in this paper the concept of intrinsically multivariate predictive (IMP) genes, and present a mathematical study of IMP in the context of binary genes with respect to the coefficient of determination (CoD), which measures the predictive power of a set of genes with respect to a target gene. A set of predictor genes is said to be IMP for a target gene if all properly contained subsets of the predictor set are bad predictors of the target but the full predictor set predicts the target with great accuracy. We show that logic of prediction, predictive power, covariance between predictors, and the entropy of the joint probability distribution of the predictors jointly affect the appearance of IMP genes. In particular, we show that high-predictive power, small covariance among predictors, a large entropy of the joint probability distribution of predictors, and certain logics, such as XOR in the 2-predictor case, are factors that favor the appearance of IMP. The IMP concept is applied to characterize the behavior of the gene DUSP1, which exhibits control over a central, process-integrating signaling pathway, thereby providing preliminary evidence that IMP can be used as a criterion for discovery of canalizing genes.


Information Sciences | 2014

A feature selection technique for inference of graphs from their known topological properties: Revealing scale-free gene regulatory networks

Fabrício Martins Lopes; David Correa Martins; Junior Barrera; Roberto M. Cesar

Abstract An important problem in bioinformatics is the inference of gene regulatory networks (GRNs) from expression profiles. In general, the main limitations faced by GRN inference methods are the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. Alternatives are thus needed to obtain better accuracy for the GRNs inference problem. Many pattern recognition techniques rely on prior knowledge about the problem in addition to the training data to gain statistical estimation power. This work addresses the GRN inference problem by modeling prior knowledge about the network topology. The main contribution of this paper is a novel methodology that aggregates scale-free properties to a classical low-cost feature selection method, known as Sequential Floating Forward Selection (SFFS), for guiding the inference task. Such methodology explores the search space iteratively by applying a scale-free property to reduce the search space. In this way, the search space traversed by the method integrates the exploration of all combinations of predictors set when the number of combinations is small (dimensionality 〈 k 〉 ⩽ 2 ) with a floating search when the number of combinations becomes explosive (dimensionality 〈 k 〉 ⩾ 3 ). This process is guided by scale-free prior information. Experimental results using synthetic and real data show that this technique provides smaller estimation errors than those obtained without guiding the SFFS application by the scale-free model, thus maintaining the robustness of the SFFS method. Therefore, we show that the proposed framework may be applied in combination with other existing GRN inference methods to improve the prediction accuracy of networks with scale-free properties.


Pattern Analysis and Applications | 2006

W-operator window design by minimization of mean conditional entropy

David Correa Martins; Roberto M. Cesar; Junior Barrera

This paper presents a technique that gives a minimal window W for the estimation of a W-operator from training data. The idea is to choose a subset of variables W that maximizes the information observed in a training set. The task is formalized as a combinatorial optimization problem, where the search space is the powerset of the candidate variables and the measure to be minimized is the mean entropy of the estimated conditional probabilities. As a full exploration of the search space requires prohibitive computational effort, some heuristics of the feature selection literature are applied. The proposed technique is mathematically sound and experimental results including binary image filtering and gray-scale texture recognition show its successful performance in practice.


Pattern Recognition | 2010

U-curve: A branch-and-bound optimization algorithm for U-shaped cost functions on Boolean lattices applied to the feature selection problem

Marcelo Ris; Junior Barrera; David Correa Martins

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: (i) the search space is the power set of a finite set structured as a Boolean lattice; (ii) the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to the sequential floating forward selection (SFFS), which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time.


BMC Bioinformatics | 2013

Gene regulatory networks inference using a multi-GPU exhaustive search algorithm

Fabrizio F. Borelli; Raphael Y. de Camargo; David Correa Martins; Luiz C. S. Rozante

BackgroundGene regulatory networks (GRN) inference is an important bioinformatics problem in which the gene interactions need to be deduced from gene expression data, such as microarray data. Feature selection methods can be applied to this problem. A feature selection technique is composed by two parts: a search algorithm and a criterion function. Among the search algorithms already proposed, there is the exhaustive search where the best feature subset is returned, although its computational complexity is unfeasible in almost all situations. The objective of this work is the development of a low cost parallel solution based on GPU architectures for exhaustive search with a viable cost-benefit. We use CUDA™, a general purpose parallel programming platform that allows the usage of NVIDIA® GPUs to solve complex problems in an efficient way.ResultsWe developed a parallel algorithm for GRN inference based on multiple GPU cards and obtained encouraging speedups (order of hundreds), when assuming that each target gene has two multivariate predictors. Also, experiments using single and multiple GPUs were performed, indicating that the speedup grows almost linearly with the number of GPUs.ConclusionIn this work, we present a proof of principle, showing that it is possible to parallelize the exhaustive search algorithm in GPUs with encouraging results. Although our focus in this paper is on the GRN inference problem, the exhaustive search technique based on GPU developed here can be applied (with minor adaptations) to other combinatorial problems.


international conference on bioinformatics | 2009

Comparative study of GRNS inference methods based on feature selection by mutual information

Fabrício Martins Lopes; David Correa Martins; Roberto M. Cesar

Feature selection is a crucial topic in pattern recognition applications, especially in the genetic regulatory networks (GRNs) inference problem which usually involves data with a large number of variables and small number of observations. In this context, the application of dimensionality reduction approaches such as those based on feature selection becomes a mandatory step in order to select the most important predictor genes that can explain some phenomena associated with the target genes. Given its importance in GRN inference, many feature selection methods (algorithms and criterion functions) have been proposed. However, it is decisive to validate such results in order to better understand its significance. The present work proposes a comparative study of feature selection techniques involving information theory concepts, applied to the estimation of GRNs from simulated temporal expression data generated by an artificial gene network (AGN) model. Four GRN inference methods are compared in terms of global network measures. Some interesting conclusions can be drawn from the experimental results.


pattern recognition in bioinformatics | 2010

SFFS-MR: a floating search strategy for GRNs inference

Fabrício Martins Lopes; David Correa Martins; Junior Barrera; Roberto M. Cesar

An important problem in the bioinformatics field is the inference of gene regulatory networks (GRN) from temporal expression profiles. In general, the main limitations faced by GRN inference methods is the small number of samples with huge dimensionalities and the noisy nature of the expression measurements. In face of these limitations, alternatives are needed to get better accuracy on the GRNs inference problem. In this context, this work addresses this problemby presenting an alternative feature selection method that applies prior knowledge on its search strategy, called SFFS-MR. The proposed search strategy is based on SFFS algorithm, with the inclusion of multiple roots at the beginning of the search, which are defined by the best and worst single results of the SFS algorithm. In this way, the search space traversed is guided by these roots in order to find the predictor genes for a given target gene, specially to better identify genes presenting intrinsically multivariate prediction, without worsening the asymptotical computational cost of the SFFS. Experimental results show that the SFFS-MR provides a better inference accuracy than SFS and SFFS, maintaining a similar robustness of the SFS and SFFS methods. In addition, the SFFS-MR was able to achieve 60% of accuracy on network recovery after only 20 observations from a state space of size 220, thus presenting very good results.


Information Sciences | 2013

Signal propagation in Bayesian networks and its relationship with intrinsically multivariate predictive variables

David Correa Martins; Evaldo A. de Oliveira; Ulisses Braga-Neto; Ronaldo Fumio Hashimoto; Roberto M. Cesar

A set of predictor variables is said to be intrinsically multivariate predictive (IMP) for a target variable if all properly contained subsets of the predictor set are poor predictors of the target but the full set predicts the target with great accuracy. In a previous article, the main properties of IMP Boolean variables have been analytically described, including the introduction of the IMP score, a metric based on the coefficient of determination (CoD) as a measure of predictiveness with respect to the target variable. It was shown that the IMP score depends on four main properties: logic of connection, predictive power, covariance between predictors and marginal predictor probabilities (biases). This paper extends that work to a broader context, in an attempt to characterize properties of discrete Bayesian networks that contribute to the presence of variables (network nodes) with high IMP scores. We have found that there is a relationship between the IMP score of a node and its territory size, i.e., its position along a pathway with one source: nodes far from the source display larger IMP scores than those closer to the source, and longer pathways display larger maximum IMP scores. This appears to be a consequence of the fact that nodes with small territory have larger probability of having highly covariate predictors, which leads to smaller IMP scores. In addition, a larger number of XOR and NXOR predictive logic relationships has positive influence over the maximum IMP score found in the pathway. This work presents analytical results based on a simple structure network and an analysis involving random networks constructed by computational simulations. Finally, results from a real Bayesian network application are provided.

Collaboration


Dive into the David Correa Martins's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Junior Barrera

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fabrício Martins Lopes

Federal University of Technology - Paraná

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Luiz C. S. Rozante

Universidade Federal do ABC

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marcelo Ris

University of São Paulo

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge