Alexandre P. Francisco
Instituto Superior Técnico
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alexandre P. Francisco.
BMC Bioinformatics | 2012
Alexandre P. Francisco; Cátia Vaz; Pedro T. Monteiro; José Melo-Cristino; Mário Ramirez; João A. Carriço
BackgroundWith the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it.ResultsPHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available.ConclusionsPHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
BMC Bioinformatics | 2009
Alexandre P. Francisco; Miguel M. F. Bugalho; Mário Ramirez; João A. Carriço
BackgroundMultilocus Sequence Typing (MLST) is a frequently used typing method for the analysis of the clonal relationships among strains of several clinically relevant microbial species. MLST is based on the sequence of housekeeping genes that result in each strain having a distinct numerical allelic profile, which is abbreviated to a unique identifier: the sequence type (ST). The relatedness between two strains can then be inferred by the differences between allelic profiles. For a more comprehensive analysis of the possible patterns of evolutionary descent, a set of rules were proposed and implemented in the eBURST algorithm. These rules allow the division of a data set into several clusters of related strains, dubbed clonal complexes, by implementing a simple model of clonal expansion and diversification. Within each clonal complex, the rules identify which links between STs correspond to the most probable pattern of descent. However, the eBURST algorithm is not globally optimized, which can result in links, within the clonal complexes, that violate the rules proposed.ResultsHere, we present a globally optimized implementation of the eBURST algorithm – goeBURST. The search for a global optimal solution led to the formalization of the problem as a graphic matroid, for which greedy algorithms that provide an optimal solution exist. Several public data sets of MLST data were tested and differences between the two implementations were found and are discussed for five bacterial species: Enterococcus faecium, Streptococcus pneumoniae, Burkholderia pseudomallei, Campylobacter jejuni and Neisseria spp.. A novel feature implemented in goeBURST is the representation of the level of tiebreak rule reached before deciding if a link should be drawn, which can used to visually evaluate the reliability of the represented hypothetical pattern of descent.ConclusiongoeBURST is a globally optimized implementation of the eBURST algorithm, that identifies alternative patterns of descent for several bacterial species. Furthermore, the algorithm can be applied to any multilocus typing data based on the number of differences between numeric profiles. A software implementation is available at http://goeBURST.phyloviz.net.
Nucleic Acids Research | 2011
Dário Abdulrehman; Pedro T. Monteiro; Miguel C. Teixeira; Nuno P. Mira; Artur B. Lourenço; Sandra Costa dos Santos; Tânia R. Cabrito; Alexandre P. Francisco; Sara C. Madeira; Ricardo Santos Aires; Arlindo L. Oliveira; Isabel Sá-Correia; Ana T. Freitas
The YEAst Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT) information system (http://www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2010, this database contains over 48 200 regulatory associations between transcription factors (TFs) and target genes, including 298 specific DNA-binding sites for 110 characterized TFs. All regulatory associations stored in the database were revisited and detailed information on the experimental evidences that sustain those associations was added and classified as direct or indirect evidences. The inclusion of this new data, gathered in response to the requests of YEASTRACT users, allows the user to restrict its queries to subsets of the data based on the existence or not of experimental evidences for the direct action of the TFs in the promoter region of their target genes. Another new feature of this release is the availability of all data through a machine readable web-service interface. Users are no longer restricted to the set of available queries made available through the existing web interface, and can use the web service interface to query, retrieve and exploit the YEASTRACT data using their own implementation of additional functionalities. The YEASTRACT information system is further complemented with several computational tools that facilitate the use of the curated data when answering a number of important biological questions. Since its first release in 2006, YEASTRACT has been extensively used by hundreds of researchers from all over the world. We expect that by making the new data and services available, the system will continue to be instrumental for yeast biologists and systems biology researchers.
Nucleic Acids Research | 2014
Miguel C. Teixeira; Pedro T. Monteiro; Joana F. Guerreiro; Joana P. Gonçalves; Nuno P. Mira; Sandra Costa dos Santos; Tânia R. Cabrito; Margarida Palma; Catarina Costa; Alexandre P. Francisco; Sara C. Madeira; Arlindo L. Oliveira; Ana T. Freitas; Isabel Sá-Correia
The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200 000 regulatory associations between transcription factors (TFs) and target genes, including 326 DNA binding sites for 113 TFs. All regulatory associations stored in YEASTRACT were revisited and new information was added on the experimental conditions in which those associations take place and on whether the TF is acting on its target genes as activator or repressor. Based on this information, new queries were developed allowing the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. This release further offers tools to rank the TFs controlling a gene or genome-wide response by their relative importance, based on (i) the percentage of target genes in the data set; (ii) the enrichment of the TF regulon in the data set when compared with the genome; or (iii) the score computed using the TFRank system, which selects and prioritizes the relevant TFs by walking through the yeast regulatory network. We expect that with the new data and services made available, the system will continue to be instrumental for yeast biologists and systems biology researchers.
Nucleic Acids Research | 2007
Pedro T. Monteiro; Nuno D. Mendes; Miguel C. Teixeira; Sofia d’Orey; Sandra Tenreiro; Nuno P. Mira; Hélio Pais; Alexandre P. Francisco; Alexandra M. Carvalho; Artur B. Lourenço; Isabel Sá-Correia; Arlindo L. Oliveira; Ana T. Freitas
The Yeast search for transcriptional regulators and consensus tracking (YEASTRACT) information system (www.yeastract.com) was developed to support the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in September 2007, this database contains over 30 990 regulatory associations between Transcription Factors (TFs) and target genes and includes 284 specific DNA binding sites for 108 characterized TFs. Computational tools are also provided to facilitate the exploitation of the gathered data when solving a number of biological questions, in particular the ones that involve the analysis of global gene expression results. In this new release, YEASTRACT includes DISCOVERER, a set of computational tools that can be used to identify complex motifs over-represented in the promoter regions of co-regulated genes. The motifs identified are then clustered in families, represented by a position weight matrix and are automatically compared with the known transcription factor binding sites described in YEASTRACT. Additionally, in this new release, it is possible to generate graphic depictions of transcriptional regulatory networks for documented or potential regulatory associations between TFs and target genes. The visual display of these networks of interactions is instrumental in functional studies. Tutorials are available on the system to exemplify the use of all the available tools.
Bioinformatics | 2017
Marta Nascimento; Adriano Sousa; Mário Ramirez; Alexandre P. Francisco; João A. Carriço; Cátia Vaz
High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results. AVAILABILITY AND IMPLEMENTATION http://www.phyloviz.net/ (licensed under GPLv3). CONTACT [email protected] information: Supplementary data are available at Bioinformatics online.
PLOS ONE | 2012
Joana P. Gonçalves; Alexandre P. Francisco; Yves Moreau; Sara C. Madeira
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Nucleic Acids Research | 2016
Bruno Ribeiro-Gonçalves; Alexandre P. Francisco; Cátia Vaz; Mário Ramirez; João A. Carriço
High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net.
Algorithms for Molecular Biology | 2012
Susana Vinga; Alexandra M. Carvalho; Alexandre P. Francisco; Luís M. S. Russo; Jonas S. Almeida
BackgroundChaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.ResultsThe exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.ConclusionsThe analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.
ad hoc networks | 2015
Naércio Magaia; Alexandre P. Francisco; Paulo Rogério Pereira; Miguel Correia
Dynamic networks, in particular Delay Tolerant Networks (DTNs), are characterized by a lack of end-to-end paths at any given instant. Because of that, DTN routing protocols employ a store-carry-and-forward approach, holding messages until a suitable node to forward them is found. But, the selection of the best forwarding node poses a considerable challenge. Additional network information (static or dynamic) can be leveraged to aid routing protocols in this troublesome task. One could use centrality metrics, therefore providing means to differentiate the importance of nodes in the network. Among these metrics, betweenness centrality is one of the most prominent, as it measures the degree to which a vertex is in a position of brokerage by summing up the fraction of shortest paths between other pairs of vertices passing through it. So, in this paper, betweenness centrality is surveyed, that is, its definitions and variants in static and dynamic networks are presented. Also, a survey of standard algorithms used to compute the metric (exact and approximate) is presented. Finally, a survey and a discussion on how DTN routing protocols make use of the betweenness centrality metric and algorithms to aid message forwarding is also presented.