Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where André O. Falcão is active.

Publication


Featured researches published by André O. Falcão.


PLOS Computational Biology | 2009

Semantic Similarity in Biomedical Ontologies

Catia Pesquita; Daniel Faria; André O. Falcão; Phillip Lord; Francisco M. Couto

In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies. Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.


BMC Bioinformatics | 2008

Metrics for GO based protein semantic similarity: a systematic evaluation

Catia Pesquita; Daniel Faria; Hugo P. Bastos; António E. N. Ferreira; André O. Falcão; Francisco M. Couto

BackgroundSeveral semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.ResultsWe conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.ConclusionsThis work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resniks measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.


Journal of Chemical Information and Modeling | 2012

A Bayesian approach to in silico blood-brain barrier penetration modeling.

Ines Filipa Martins; Ana L. Teixeira; Luis Pinheiro; André O. Falcão

The human blood-brain barrier (BBB) is a membrane that protects the central nervous system (CNS) by restricting the passage of solutes. The development of any new drug must take into account its existence whether for designing new molecules that target components of the CNS or, on the other hand, to find new substances that should not penetrate the barrier. Several studies in the literature have attempted to predict BBB penetration, so far with limited success and few, if any, application to real world drug discovery and development programs. Part of the reason is due to the fact that only about 2% of small molecules can cross the BBB, and the available data sets are not representative of that reality, being generally biased with an over-representation of molecules that show an ability to permeate the BBB (BBB positives). To circumvent this limitation, the current study aims to devise and use a new approach based on Bayesian statistics, coupled with state-of-the-art machine learning methods to produce a robust model capable of being applied in real-world drug research scenarios. The data set used, gathered from the literature, totals 1970 curated molecules, one of the largest for similar studies. Random Forests and Support Vector Machines were tested in various configurations against several chemical descriptor set combinations. Models were tested in a 5-fold cross-validation process, and the best one tested over an independent validation set. The best fitted model produced an overall accuracy of 95%, with a mean square contingency coefficient (ϕ) of 0.74, and showing an overall capacity for predicting BBB positives of 83% and 96% for determining BBB negatives. This model was adapted into a Web based tool made available for the whole community at http://b3pp.lasige.di.fc.ul.pt.


PLOS ONE | 2012

Mining GO Annotations for Improving Annotation Consistency

Daniel Faria; Andreas Schlicker; Catia Pesquita; Hugo P. Bastos; António E. N. Ferreira; Mario Albrecht; André O. Falcão

Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.


Archive | 2002

HEURISTICS IN MULTI-OBJECTIVE FOREST MANAGEMENT

José G. Borges; Howard M. Hoganson; André O. Falcão

Heuristics have been used extensively to support forest management scheduling in the last two decades. The need for spatial definition, and the combined shortcomings of available technology and traditional mathematical programming approaches, sparked interest in alternative forest management scheduling techniques in the early 1980s. Concerns with the environmental impacts of forest management options further encouraged the development of heuristics to address adjacency relationships in harvesting decisions. More recently, heuristics have been used to target other multi-objective management issues. Namely, they have been used to provide information to help sustain both traditional forest products flows (e.g. timber and cork) and landscape structural characteristics (e.g., mosaic elements such as patch number and size, amounts of edge or interior space). In this chapter, we describe the current state of the art of heuristic application in forest management scheduling. Heuristic approaches are presented and discussed in the framework of forest management scheduling needs. Results from some heuristic research efforts are used to outline the application potential and shortcomings of these techniques.


Journal of Cheminformatics | 2013

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

Ana L. Teixeira; João Paulo Leal; André O. Falcão

BackgroundOne of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance.ResultsThe model generalizes well even with a high dimensional dataset and in the presence of highly correlated variables. The feature selection step was shown to yield lower prediction errors with RMSE values 23% lower than without feature selection, albeit using only 6% of the total number of variables (89 from the original 1485). The proposed approach further compared favourably with other feature selection methods and dimension reduction of the feature space. The predictive model was selected using a 10-fold cross validation procedure and, after selection, it was validated with an independent set to assess its performance when applied to new data and the results were similar to the ones obtained for the training set, supporting the robustness of the proposed approach.ConclusionsThe proposed methodology seemingly improves the prediction performance of standard enthalpy of formation of hydrocarbons using a limited set of molecular descriptors, providing faster and more cost-effective calculation of descriptors by reducing their numbers, and providing a better understanding of the underlying relationship between the molecular structure represented by descriptors and the property of interest.


Neurocomputing | 2006

Flexible kernels for RBF networks

André O. Falcão; Thibault Langlois; Andreas Wichert

Abstract In this paper we propose a novel approach for modeling kernels in Radial Basis Function networks. The method provides an extra degree of flexibility to the kernel structure. This flexibility comes through the use of modifier functions applied to the distance computation procedure, essential for all kernel evaluations. Initially the classifier uses an unsupervised method to construct the network topology, where most parameters of the network are defined without any customization from the user. During the second phase only one parameter per kernel is estimated. Experimental evidence on four datasets shows that the algorithm is robust and competitive.


Proceedings of the 14th International Academic MindTrek Conference on Envisioning Future Media Environments | 2010

VIRUS: video information retrieval using subtitles

Thibault Langlois; Teresa Chambel; Eva Oliveira; Paula Carvalho; Gonçalo Marques; André O. Falcão

Video is a very rich medium that is becoming increasingly dominant. A massive amount of video information is available, but very difficult to access if not adequately indexed: a challenging task to accomplish. We describe a Video Information Retrieval system, under development, that operates on a database composed of subtitled documents. The simultaneous analysis of video, subtitles and audio streams is performed in order to index, visualize and retrieve excerpts of video documents that share a certain emotional or semantic property.


decision support systems | 2014

Collaborative development of a semantic wiki on forest management decision support

A.F. Marques; C. Rosset; J. Rasinmaki; Harald Vacik; S.N. Gordon; S. Nobre; André O. Falcão; D. Weber; M. Granitzer; Ljusk-Ola Eriksson

Semantic wikis support collaboratively editing, categorising, interlinking and retrieving web pages for a group of experts working in a certain domain. The use of semantic technologies allows the expression of wiki content in a more structured way, which increases its potential use. This contribution presents an overview of the development process towards a semantic wiki related to a repository of forest decision support systems, including models, methods and data used, as well as case studies and lessons learned. An international group of experts took part in the conceptualisation of the semantic wiki (i.e. identification of wiki properties and forms), provided content and developed queries to analyse the information gathered. The resulting ForestDSS wiki gives an overview of the current use, development and application of forest decision support systems worldwide. Based on the experiences gathered during the process, some challenges are reported and conclusions on further developments are made.


Journal of Chemical Information and Modeling | 2014

Structural similarity based kriging for quantitative structure activity and property relationship modeling.

Ana L. Teixeira; André O. Falcão

Structurally similar molecules tend to have similar properties, i.e. closer molecules in the molecular space are more likely to yield similar property values while distant molecules are more likely to yield different values. Based on this principle, we propose the use of a new method that takes into account the high dimensionality of the molecular space, predicting chemical, physical, or biological properties based on the most similar compounds with measured properties. This methodology uses ordinary kriging coupled with three different molecular similarity approaches (based on molecular descriptors, fingerprints, and atom matching) which creates an interpolation map over the molecular space that is capable of predicting properties/activities for diverse chemical data sets. The proposed method was tested in two data sets of diverse chemical compounds collected from the literature and preprocessed. One of the data sets contained dihydrofolate reductase inhibition activity data, and the second molecules for which aqueous solubility was known. The overall predictive results using kriging for both data sets comply with the results obtained in the literature using typical QSPR/QSAR approaches. However, the procedure did not involve any type of descriptor selection or even minimal information about each problem, suggesting that this approach is directly applicable to a large spectrum of problems in QSAR/QSPR. Furthermore, the predictive results improve significantly with the similarity threshold between the training and testing compounds, allowing the definition of a confidence threshold of similarity and error estimation for each case inferred. The use of kriging for interpolation over the molecular metric space is independent of the training data set size, and no reparametrizations are necessary when more compounds are added or removed from the set, and increasing the size of the database will consequentially improve the quality of the estimations. Finally it is shown that this model can be used for checking the consistency of measured data and for guiding an extension of the training set by determining the regions of the molecular space for which new experimental measurements could be used to maximize the models predictive performance.

Collaboration


Dive into the André O. Falcão's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

João Paulo Leal

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge