Catia Pesquita
University of Lisbon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Catia Pesquita.
Nature Biotechnology | 2009
Ian W. Taylor; Rune Linding; David Warde-Farley; Yongmei Liu; Catia Pesquita; Daniel Faria; Shelley B. Bull; Tony Pawson; Quaid Morris; Jeffrey L. Wrana
Changes in the biochemical wiring of oncogenic cells drives phenotypic transformations that directly affect disease outcome. Here we examine the dynamic structure of the human protein interaction network (interactome) to determine whether changes in the organization of the interactome can be used to predict patient outcome. An analysis of hub proteins identified intermodular hub proteins that are co-expressed with their interacting partners in a tissue-restricted manner and intramodular hub proteins that are co-expressed with their interacting partners in all or most tissues. Substantial differences in biochemical structure were observed between the two types of hubs. Signaling domains were found more often in intermodular hub proteins, which were also more frequently associated with oncogenesis. Analysis of two breast cancer patient cohorts revealed that altered modularity of the human interactome may be useful as an indicator of breast cancer prognosis.
PLOS Computational Biology | 2009
Catia Pesquita; Daniel Faria; André O. Falcão; Phillip Lord; Francisco M. Couto
In recent years, ontologies have become a mainstream topic in biomedical research. When biological entities are described using a common schema, such as an ontology, they can be compared by means of their annotations. This type of comparison is called semantic similarity, since it assesses the degree of relatedness between two entities by the similarity in meaning of their annotations. The application of semantic similarity to biomedical ontologies is recent; nevertheless, several studies have been published in the last few years describing and evaluating diverse approaches. Semantic similarity has become a valuable tool for validating the results drawn from biomedical studies such as gene clustering, gene expression data analysis, prediction and validation of molecular interactions, and disease gene prioritization. We review semantic similarity measures applied to biomedical ontologies and propose their classification according to the strategies they employ: node-based versus edge-based and pairwise versus groupwise. We also present comparative assessment studies and discuss the implications of their results. We survey the existing implementations of semantic similarity measures, and we describe examples of applications to biomedical research. This will clarify how biomedical researchers can benefit from semantic similarity measures and help them choose the approach most suitable for their studies. Biomedical ontologies are evolving toward increased coverage, formality, and integration, and their use for annotation is increasingly becoming a focus of both effort by biomedical experts and application of automated annotation procedures to create corpora of higher quality and completeness than are currently available. Given that semantic similarity measures are directly dependent on these evolutions, we can expect to see them gaining more relevance and even becoming as essential as sequence similarity is today in biomedical research.
BMC Bioinformatics | 2008
Catia Pesquita; Daniel Faria; Hugo P. Bastos; António E. N. Ferreira; André O. Falcão; Francisco M. Couto
BackgroundSeveral semantic similarity measures have been applied to gene products annotated with Gene Ontology terms, providing a basis for their functional comparison. However, it is still unclear which is the best approach to semantic similarity in this context, since there is no conclusive evaluation of the various measures. Another issue, is whether electronic annotations should or not be used in semantic similarity calculations.ResultsWe conducted a systematic evaluation of GO-based semantic similarity measures using the relationship with sequence similarity as a means to quantify their performance, and assessed the influence of electronic annotations by testing the measures in the presence and absence of these annotations. We verified that the relationship between semantic and sequence similarity is not linear, but can be well approximated by a rescaled Normal cumulative distribution function. Given that the majority of the semantic similarity measures capture an identical behaviour, but differ in resolution, we used the latter as the main criterion of evaluation.ConclusionsThis work has provided a basis for the comparison of several semantic similarity measures, and can aid researchers in choosing the most adequate measure for their work. We have found that the hybrid simGIC was the measure with the best overall performance, followed by Resniks measure using a best-match average combination approach. We have also found that the average and maximum combination approaches are problematic since both are inherently influenced by the number of terms being combined. We suspect that there may be a direct influence of data circularity in the behaviour of the results including electronic annotations, as a result of functional inference from sequence similarity.
Lecture Notes in Computer Science | 2013
Daniel Faria; Catia Pesquita; Emanuel Santos; Matteo Palmonari; Isabel F. Cruz; Francisco M. Couto
AgreementMaker is one of the leading ontology matching systems, thanks to its combination of a flexible and extensible framework with a comprehensive user interface. In many domains, such as the biomedical, ontologies are becoming increasingly large thus presenting new challenges. We have developed a new core framework, AgreementMakerLight, focused on computational efficiency and designed to handle very large ontologies, while preserving most of the flexibility and extensibility of the original AgreementMaker framework. We evaluated the efficiency of AgreementMakerLight in two OAEI tracks: Anatomy and Large Biomedical Ontologies, obtaining excellent run time results. In addition, for the Anatomy track, AgreementMakerLight is now the best system as measured in terms of F-measure. Also in terms of F-measure, AgreementMakerLight is competitive with the best OAEI performers in two of the three tasks of the Large Biomedical Ontologies track that match whole ontologies.
PLOS ONE | 2015
Emanuel Santos; Daniel Faria; Catia Pesquita; Francisco M. Couto
Ontology Matching aims at identifying a set of semantic correspondences, called an alignment, between related ontologies. In recent years, there has been a growing interest in efficient and effective matching methods for large ontologies. However, alignments produced for large ontologies are often logically incoherent. It was only recently that the use of repair techniques to improve the coherence of ontology alignments began to be explored. This paper presents a novel modularization technique for ontology alignment repair which extracts fragments of the input ontologies that only contain the necessary classes and relations to resolve all detectable incoherences. The paper presents also an alignment repair algorithm that uses a global repair strategy to minimize both the degree of incoherence and the number of mappings removed from the alignment, while overcoming the scalability problem by employing the proposed modularization technique. Our evaluation shows that our modularization technique produces significantly small fragments of the ontologies and that our repair algorithm produces more complete alignments than other current alignment repair systems, while obtaining an equivalent degree of incoherence. Additionally, we also present a variant of our repair algorithm that makes use of the confidence values of the mappings to improve alignment repair. Our repair algorithm was implemented as part of AgreementMakerLight, a free and open-source ontology matching system.
PLOS ONE | 2012
Daniel Faria; Andreas Schlicker; Catia Pesquita; Hugo P. Bastos; António E. N. Ferreira; Mario Albrecht; André O. Falcão
Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.
International Scholarly Research Notices | 2012
Tiago Grego; Catia Pesquita; Hugo P. Bastos; Francisco M. Couto
Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.
PLOS Computational Biology | 2012
Catia Pesquita; Francisco M. Couto
Developing and extending a biomedical ontology is a very demanding task that can never be considered complete given our ever-evolving understanding of the life sciences. Extension in particular can benefit from the automation of some of its steps, thus releasing experts to focus on harder tasks. Here we present a strategy to support the automation of change capturing within ontology extension where the need for new concepts or relations is identified. Our strategy is based on predicting areas of an ontology that will undergo extension in a future version by applying supervised learning over features of previous ontology versions. We used the Gene Ontology as our test bed and obtained encouraging results with average f-measure reaching 0.79 for a subset of biological process terms. Our strategy was also able to outperform state of the art change capturing methods. In addition we have identified several issues concerning prediction of ontology evolution, and have delineated a general framework for ontology extension prediction. Our strategy can be applied to any biomedical ontology with versioning, to help focus either manual or semi-automated extension methods on areas of the ontology that need extension.
international semantic web conference | 2014
Daniel Faria; Ernesto Jiménez-Ruiz; Catia Pesquita; Emanuel Santos; Francisco M. Couto
BioPortal is a repository for biomedical ontologies that also includes mappings between them from various sources. Considered as a whole, these mappings may cause logical errors, due to incompatibilities between the ontologies or even erroneous mappings. We have performed an automatic evaluation of BioPortal mappings between 19 ontology pairs using the mapping repair systems of LogMap and AgreementMakerLight. We found logical errors in 11 of these pairs, which on average involved 22% of the mappings between each pair. Furthermore, we conducted a manual evaluation of the repair results to identify the actual sources of error, verifying that erroneous mappings were behind over 60% of the repairs. Given the results of our analysis, we believe that annotating BioPortal mappings with information about their logical conflicts with other mappings would improve their usability for semantic web applications and facilitate the identification of erroneous mappings. In future work, we aim to collaborate with BioPortal developers in extending BioPortal with these annotations.
international semantic web conference | 2016
Zlatan Dragisic; Valentina Ivanova; Patrick Lambrix; Daniel Faria; Ernesto Jiménez-Ruiz; Catia Pesquita
User validation is one of the challenges facing the ontology alignment community, as there are limits to the quality of automated alignment algorithms. In this paper we present a broad study on user validation of ontology alignments that encompasses three distinct but interrelated aspects: the profile of the user, the services of the alignment system, and its user interface. We discuss key issues pertaining to the alignment validation process under each of these aspects, and provide an overview of how current systems address them. Finally, we use experiments from the Interactive Matching track of the Ontology Alignment Evaluation Initiative (OAEI) 2015 to assess the impact of errors in alignment validation, and how systems cope with them as function of their services.