Eduardo De Paula Costa
Katholieke Universiteit Leuven
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eduardo De Paula Costa.
Genetics and Molecular Biology | 2005
Maurício Bacci; Rafael B.S. Soares; Eloiza Helena Tajara; Guilherme Ambar; Carlos Norberto Fischer; Ivan Rizzo Guilherme; Eduardo De Paula Costa; Vitor Fernandes Oliveira de Miranda
Transposable elements (TE) are major components of eukaryotic genomes and involved in cell regulation and organism evolution. We have analyzed 123,889 expressed sequence tags of the Eucalyptus Genome Project database and found 124 sequences representing 76 TE in 9 groups, of which copia, MuDR and FAR1 groups were the most abundant. The low amount of sequences of TE may reflect the high efficiency of repression of these elements, a process that is called TE silencing. Frequency of groups of TE in Eucalyptus libraries which were prepared with different tissues or physiologic conditions from seedlings or adult plants indicated that developing plants experience the expression of a much wider spectrum of TE groups than that seen in adult plants. These are preliminary results that identify the most relevant TE groups involved with Eucalyptus development, which is important for industrial wood production.
intelligent data analysis | 2013
Eduardo De Paula Costa; Sicco Verwer; Hendrik Blockeel
Decision trees estimate prediction certainty using the class distribution in the leaf responsible for the prediction. We introduce an alternative method that yields better estimates. For each instance to be predicted, our method inserts the instance to be classified in the training set with one of the possible labels for the target attribute; this procedure is repeated for each one of the labels. Then, by comparing the outcome of the different trees, the method can identify instances that might present some difficulties to be correctly classified, and attribute some uncertainty to their prediction. We perform an extensive evaluation of the proposed method, and show that it is particularly suitable for ranking and reliability estimations. The ideas investigated in this paper may also be applied to other machine learning techniques, as well as combined with other methods for prediction certainty estimation.
Evolutionary Bioinformatics | 2013
Eduardo De Paula Costa; Celine Vens; Hendrik Blockeel
We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.
PLOS Computational Biology | 2018
Leander Schietgat; Celine Vens; Ricardo Cerri; Carlos Norberto Fischer; Eduardo De Paula Costa; Jan Ramon; Claudia Marcia Aparecida Carareto; Hendrik Blockeel
Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner’s predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.
evolutionary computation machine learning and data mining in bioinformatics | 2010
Celine Vens; Eduardo De Paula Costa; Hendrik Blockeel
We propose a novel distance based method for phylogenetic tree reconstruction. Our method is based on a conceptual clustering method that extends the well-known decision tree learning approach. It starts from a single cluster and repeatedly splits it into subclusters until all sequences form a different cluster. We assume that a split can be described by referring to particular polymorphic locations, which makes such a divisive method computationally feasible. To define the best split, we use a criterion that is close to Neighbor Joining’s optimization criterion, namely, minimizing total branch length. A thorough experimental evaluation shows that our method yields phylogenetic trees with an accuracy comparable to that of existing methods. Moreover, it has a number of important advantages. First, by listing the polymorphic locations at the internal nodes, it provides an explanation for the resulting tree topology. Second, the top-down tree growing process can be stopped before a complete tree is generated, yielding an efficient gene or protein subfamily identification approach. Third, the resulting trees can be used as classification trees to classify new sequences into subfamilies.
inductive logic programming | 2013
Eduardo De Paula Costa; Leander Schietgat; Ricardo Cerri; Celine Vens; Carlos Norberto Fischer; Claudia Marcia Aparecida Carareto; Jan Ramon; Hendrik Blockeel
EvoWorkshops | 2010
Celine Vens; Eduardo De Paula Costa; Hendrik Blockeel
Archive | 2013
Eduardo De Paula Costa; Leander Schietgat; Ricardo Cerri; Celine Vens; Carlos Norberto Fischer; Claudia Ma Carareto; Jan Ramon; Hendrik Blockeel
Archive | 2009
Celine Vens; Eduardo De Paula Costa; Hendrik Blockeel
Revista Portuguesa de Estomatologia, Medicina Dentária e Cirurgia Maxilofacial | 2013
Carlos Pintado; Paula Vaz; Manuela Pintado; Eduardo De Paula Costa; L.A. Rocha; Antonio Felino