Eduardo De Paula Costa

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eduardo De Paula Costa is active.

Explore More

Publication

Featured researches published by Eduardo De Paula Costa.

Genetics and Molecular Biology | 2005

Identification and frequency of transposable elements in Eucalyptus

Maurício Bacci; Rafael B.S. Soares; Eloiza Helena Tajara; Guilherme Ambar; Carlos Norberto Fischer; Ivan Rizzo Guilherme; Eduardo De Paula Costa; Vitor Fernandes Oliveira de Miranda

Transposable elements (TE) are major components of eukaryotic genomes and involved in cell regulation and organism evolution. We have analyzed 123,889 expressed sequence tags of the Eucalyptus Genome Project database and found 124 sequences representing 76 TE in 9 groups, of which copia, MuDR and FAR1 groups were the most abundant. The low amount of sequences of TE may reflect the high efficiency of repression of these elements, a process that is called TE silencing. Frequency of groups of TE in Eucalyptus libraries which were prepared with different tissues or physiologic conditions from seedlings or adult plants indicated that developing plants experience the expression of a much wider spectrum of TE groups than that seen in adult plants. These are preliminary results that identify the most relevant TE groups involved with Eucalyptus development, which is important for industrial wood production.

intelligent data analysis | 2013

Estimating Prediction Certainty in Decision Trees

Eduardo De Paula Costa; Sicco Verwer; Hendrik Blockeel

Decision trees estimate prediction certainty using the class distribution in the leaf responsible for the prediction. We introduce an alternative method that yields better estimates. For each instance to be predicted, our method inserts the instance to be classified in the training set with one of the possible labels for the target attribute; this procedure is repeated for each one of the labels. Then, by comparing the outcome of the different trees, the method can identify instances that might present some difficulties to be correctly classified, and attribute some uncertainty to their prediction. We perform an extensive evaluation of the proposed method, and show that it is particularly suitable for ranking and reliability estimations. The ideas investigated in this paper may also be applied to other machine learning techniques, as well as combined with other methods for prediction certainty estimation.

Evolutionary Bioinformatics | 2013

Top-Down Clustering for Protein Subfamily Identification

Eduardo De Paula Costa; Celine Vens; Hendrik Blockeel

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

PLOS Computational Biology | 2018

A machine learning based framework to identify and classify long terminal repeat retrotransposons

Leander Schietgat; Celine Vens; Ricardo Cerri; Carlos Norberto Fischer; Eduardo De Paula Costa; Jan Ramon; Claudia Marcia Aparecida Carareto; Hendrik Blockeel

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner’s predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.

evolutionary computation machine learning and data mining in bioinformatics | 2010

Top-down induction of phylogenetic trees

Celine Vens; Eduardo De Paula Costa; Hendrik Blockeel

We propose a novel distance based method for phylogenetic tree reconstruction. Our method is based on a conceptual clustering method that extends the well-known decision tree learning approach. It starts from a single cluster and repeatedly splits it into subclusters until all sequences form a different cluster. We assume that a split can be described by referring to particular polymorphic locations, which makes such a divisive method computationally feasible. To define the best split, we use a criterion that is close to Neighbor Joining’s optimization criterion, namely, minimizing total branch length. A thorough experimental evaluation shows that our method yields phylogenetic trees with an accuracy comparable to that of existing methods. Moreover, it has a number of important advantages. First, by listing the polymorphic locations at the internal nodes, it provides an explanation for the resulting tree topology. Second, the top-down tree growing process can be stopped before a complete tree is generated, yielding an efficient gene or protein subfamily identification approach. Third, the resulting trees can be used as classification trees to classify new sequences into subfamilies.

inductive logic programming | 2013

Annotating transposable elements in the genome using relational decision tree ensembles

Eduardo De Paula Costa; Leander Schietgat; Ricardo Cerri; Celine Vens; Carlos Norberto Fischer; Claudia Marcia Aparecida Carareto; Jan Ramon; Hendrik Blockeel

EvoWorkshops | 2010