Joana P. Gonçalves | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joana P. Gonçalves is active.

Explore More

Publication

Featured researches published by Joana P. Gonçalves.

Nucleic Acids Research | 2014

The YEASTRACT database: an upgraded information system for the analysis of gene and genomic transcription regulation in Saccharomyces cerevisiae

Miguel C. Teixeira; Pedro T. Monteiro; Joana F. Guerreiro; Joana P. Gonçalves; Nuno P. Mira; Sandra Costa dos Santos; Tânia R. Cabrito; Margarida Palma; Catarina Costa; Alexandre P. Francisco; Sara C. Madeira; Arlindo L. Oliveira; Ana T. Freitas; Isabel Sá-Correia

The YEASTRACT (http://www.yeastract.com) information system is a tool for the analysis and prediction of transcription regulatory associations in Saccharomyces cerevisiae. Last updated in June 2013, this database contains over 200 000 regulatory associations between transcription factors (TFs) and target genes, including 326 DNA binding sites for 113 TFs. All regulatory associations stored in YEASTRACT were revisited and new information was added on the experimental conditions in which those associations take place and on whether the TF is acting on its target genes as activator or repressor. Based on this information, new queries were developed allowing the selection of specific environmental conditions, experimental evidence or positive/negative regulatory effect. This release further offers tools to rank the TFs controlling a gene or genome-wide response by their relative importance, based on (i) the percentage of target genes in the data set; (ii) the enrichment of the TF regulon in the data set when compared with the genome; or (iii) the score computed using the TFRank system, which selects and prioritizes the relevant TFs by walking through the yeast regulatory network. We expect that with the new data and services made available, the system will continue to be instrumental for yeast biologists and systems biology researchers.

BMC Bioinformatics | 2010

Candidate gene prioritization by network analysis of differential expression using machine learning approaches

Daniela Nitsch; Joana P. Gonçalves; Fabian Ojeda; Bart De Moor; Yves Moreau

BackgroundDiscovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.ResultsWe have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.ConclusionIn this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.

Nucleic Acids Research | 2011

PINTA: a web server for network-based gene prioritization from expression data

Daniela Nitsch; Léon-Charles Tranchevent; Joana P. Gonçalves; Josef Korbinian Vogt; Sara C. Madeira; Yves Moreau

PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.

BMC Research Notes | 2009

BiGGEsTS: integrated environment for biclustering analysis of time series gene expression data

Joana P. Gonçalves; Sara C. Madeira; Arlindo L. Oliveira

BackgroundThe ability to monitor changes in expression patterns over time, and to observe the emergence of coherent temporal responses using expression time series, is critical to advance our understanding of complex biological processes. Biclustering has been recognized as an effective method for discovering local temporal expression patterns and unraveling potential regulatory mechanisms. The general biclustering problem is NP-hard. In the case of time series this problem is tractable, and efficient algorithms can be used. However, there is still a need for specialized applications able to take advantage of the temporal properties inherent to expression time series, both from a computational and a biological perspective.FindingsBiGGEsTS makes available state-of-the-art biclustering algorithms for analyzing expression time series. Gene Ontology (GO) annotations are used to assess the biological relevance of the biclusters. Methods for preprocessing expression time series and post-processing results are also included. The analysis is additionally supported by a visualization module capable of displaying informative representations of the data, including heatmaps, dendrograms, expression charts and graphs of enriched GO terms.ConclusionBiGGEsTS is a free open source graphical software tool for revealing local coexpression of genes in specific intervals of time, while integrating meaningful information on gene annotations. It is freely available at: http://kdbio.inesc-id.pt/software/biggests. We present a case study on the discovery of transcriptional regulatory modules in the response of Saccharomyces cerevisiae to heat stress.

PLOS ONE | 2012

Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores

Joana P. Gonçalves; Alexandre P. Francisco; Yves Moreau; Sara C. Madeira

Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.

Bioinformatics | 2011

TFRank: network-based prioritization of regulatory associations underlying transcriptional responses

Joana P. Gonçalves; Alexandre P. Francisco; Nuno P. Mira; Miguel C. Teixeira; Isabel Sá-Correia; Arlindo L. Oliveira; Sara C. Madeira

MOTIVATION Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.

IWPACBB | 2010

e-BiMotif: Combining Sequence Alignment and Biclustering to Unravel Structured Motifs

Joana P. Gonçalves; Sara C. Madeira

Transcription factors control transcription by binding to specific sites in the DNA sequences of the target genes, which can be modeled by structured motifs. In this paper, we propose e-BiMotif, a combination of both sequence alignment and a biclustering approach relying on efficient string matching techniques based on suffix trees to unravel all approximately conserved blocks (structured motifs) while straightforwardly disregarding non-conserved regions in-between. Since the length of conserved regions is usually easier to estimate than that of non-conserved regions separating the binding sites, ignoring the width of non-conserved regions is an advantage of the proposed method over other motif finders.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2014

Latebiclustering: efficient heuristic algorithm for time-lagged bicluster identification

Joana P. Gonçalves; Sara C. Madeira

Identifying patterns in temporal data is key to uncover meaningful relationships in diverse domains, from stock trading to social interactions. Also of great interest are clinical and biological applications, namely monitoring patient response to treatment or characterizing activity at the molecular level. In biology, researchers seek to gain insight into gene functions and dynamics of biological processes, as well as potential perturbations of these leading to disease, through the study of patterns emerging from gene expression time series. Clustering can group genes exhibiting similar expression profiles, but focuses on global patterns denoting rather broad, unspecific responses. Biclustering reveals local patterns, which more naturally capture the intricate collaboration between biological players, particularly under a temporal setting. Despite the general biclustering formulation being NP-hard, considering specific properties of time series has led to efficient solutions for the discovery of temporally aligned patterns. Notably, the identification of biclusters with time-lagged patterns, suggestive of transcriptional cascades, remains a challenge due to the combinatorial explosion of delayed occurrences. Herein, we propose LateBiclustering, a sensible heuristic algorithm enabling a polynomial rather than exponential time solution for the problem. We show that it identifies meaningful time-lagged biclusters relevant to the response of Saccharomyces cerevisiae to heat stress.

PLOS ONE | 2012

Regulatory Snapshots: Integrative Mining of Regulatory Modules from Expression Time Series and Regulatory Networks

Joana P. Gonçalves; Ricardo Santos Aires; Alexandre P. Francisco; Sara C. Madeira

Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.

data mining in bioinformatics | 2012

AliBiMotif: Integrating alignment and biclustering to unravel transcription factor binding sites in DNA sequences

Joana P. Gonçalves; Yves Moreau; Sara C. Madeira

Transcription Factors (TFs) control transcription by binding to specific sites in the promoter regions of the target genes, which can be modelled by structured motifs. In this paper we propose AliBiMotif, a method combining sequence alignment and a biclustering approach based on efficient string matching techniques using suffix trees to unravel approximately conserved sets of blocks (structured motifs) while straightforwardly disregarding non-conserved stretches in-between. The ability to ignore the width of non-conserved regions is a major advantage of the proposed method over other motif finders, as the lengths of the binding sites are usually easier to estimate than the separating distances.

Explore More