Juliana S. Bernardes
University of Paris
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Juliana S. Bernardes.
The Plant Cell | 2016
Antonio Emidio Fortunato; Marianne Jaubert; Gen Enomoto; Jean-Pierre Bouly; Raffaella Raniello; Michael Thaler; Shruti Malviya; Juliana S. Bernardes; Fabrice Rappaport; Bernard Gentili; Marie Jj Huysman; Alessandra Carbone; Chris Bowler; Maurizio Ribera d'Alcalà; Masahiko Ikeuchi; Angela Falciatore
Diatom phytochromes (DPH) display high sensitivity to far-red light in the far-red poor aquatic environment, opening new perspectives on signaling mechanisms in the marine realm. The absorption of visible light in aquatic environments has led to the common assumption that aquatic organisms sense and adapt to penetrative blue/green light wavelengths but show little or no response to the more attenuated red/far-red wavelengths. Here, we show that two marine diatom species, Phaeodactylum tricornutum and Thalassiosira pseudonana, possess a bona fide red/far-red light sensing phytochrome (DPH) that uses biliverdin as a chromophore and displays accentuated red-shifted absorbance peaks compared with other characterized plant and algal phytochromes. Exposure to both red and far-red light causes changes in gene expression in P. tricornutum, and the responses to far-red light disappear in DPH knockout cells, demonstrating that P. tricornutum DPH mediates far-red light signaling. The identification of DPH genes in diverse diatom species widely distributed along the water column further emphasizes the ecological significance of far-red light sensing, raising questions about the sources of far-red light. Our analyses indicate that, although far-red wavelengths from sunlight are only detectable at the ocean surface, chlorophyll fluorescence and Raman scattering can generate red/far-red photons in deeper layers. This study opens up novel perspectives on phytochrome-mediated far-red light signaling in the ocean and on the light sensing and adaptive capabilities of marine phototrophs.
BMC Bioinformatics | 2007
Juliana S. Bernardes; Alberto M. R. Dávila; Vítor Santos Costa; Gerson Zaverucha
BackgroundRemote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance.ResultsWe used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test.ConclusionWe observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.
BMC Bioinformatics | 2008
Juliana S. Bernardes; Jorge H. Fernandez; Ana Tereza Ribeiro de Vasconcelos
BackgroundThe Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database.ResultsTo assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB.ConclusionSDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.
BMC Bioinformatics | 2015
Juliana S. Bernardes; Fabio Rj Vieira; Lygia Mm Costa; Gerson Zaverucha
BackgroundAn important problem in computational biology is the automatic detection of protein families (groups of homologous sequences). Clustering sequences into families is at the heart of most comparative studies dealing with protein evolution, structure, and function. Many methods have been developed for this task, and they perform reasonably well (over 0.88 of F-measure) when grouping proteins with high sequence identity. However, for highly diverged proteins the performance of these methods can be much lower, mainly because a common evolutionary origin is not deduced directly from sequence similarity. To the best of our knowledge, a systematic evaluation of clustering methods over distant homologous proteins is still lacking.ResultsWe performed a comparative assessment of four clustering algorithms: Markov Clustering (MCL), Transitive Clustering (TransClust), Spectral Clustering of Protein Sequences (SCPS), and High-Fidelity clustering of protein sequences (HiFix), considering several datasets with different levels of sequence similarity. Two types of similarity measures, required by the clustering sequence methods, were used to evaluate the performance of the algorithms: the standard measure obtained from sequence–sequence comparisons, and a novel measure based on profile-profile comparisons, used here for the first time.ConclusionsThe results reveal low clustering performance for the highly divergent datasets when the standard measure was used. However, the novel measure based on profile-profile comparisons substantially improved the performance of the four methods, especially when very low sequence identity datasets were evaluated. We also performed a parameter optimization step to determine the best configuration for each clustering method. We found that TransClust clearly outperformed the other methods for most datasets. This work also provides guidelines for the practical application of clustering sequence methods aimed at detecting accurately groups of related protein sequences.
Bioinformatics | 2016
Juliana S. Bernardes; Fabio Rocha Jimenez Vieira; Gerson Zaverucha; Alessandra Carbone
Motivation: Given a protein sequence and a number of potential domains matching it, what are the domain content and the most likely domain architecture for the sequence? This problem is of fundamental importance in protein annotation, constituting one of the main steps of all predictive annotation strategies. On the other hand, when potential domains are several and in conflict because of overlapping domain boundaries, finding a solution for the problem might become difficult. An accurate prediction of the domain architecture of a multi-domain protein provides important information for function prediction, comparative genomics and molecular evolution. Results: We developed DAMA (Domain Annotation by a Multi-objective Approach), a novel approach that identifies architectures through a multi-objective optimization algorithm combining scores of domain matches, previously observed multi-domain co-occurrence and domain overlapping. DAMA has been validated on a known benchmark dataset based on CATH structural domain assignments and on the set of Plasmodium falciparum proteins. When compared with existing tools on both datasets, it outperforms all of them. Availability and implementation: DAMA software is implemented in C++ and the source code can be found at http://www.lcqb.upmc.fr/DAMA. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
PLOS Computational Biology | 2016
Juliana S. Bernardes; Gerson Zaverucha; Catherine Vaquero; Alessandra Carbone
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE.
Malaria Journal | 2017
Juliana S. Bernardes; Catherine Vaquero; Alessandra Carbone
BackgroundWith the availability of complete genome sequences of both human and non-human Plasmodium parasites, it is now possible to use comparative genomics to look for orthology across Plasmodium species and for species specific genes. This comparative analyses could provide important clues for the development of new strategies to prevent and treat malaria in humans, however, the number of functionally annotated proteins is still low for all Plasmodium species. In the context of genomes that are hard to annotate because of sequence divergence, such as Plasmodium, domain co-occurrence becomes particularly important to trust predictions. In particular, domain architecture prediction can be used to improve the performance of existing annotation methods since homologous proteins might share their architectural context.Results Plasmobase is a unique database designed for the comparative study of Plasmodium genomes. Domain architecture reconstruction in Plasmobase relies on DAMA, the state-of-the-art method in architecture prediction, while domain annotation is realised with CLADE, a novel annotation tool based on a multi-source strategy. Plasmobase significantly increases the Pfam domain coverage of all Plasmodium genomes, it proposes new domain architectures as well as new domain families that have never been reported before for these genomes. It proposes a visualization of domain architectures and allows for an easy comparison among architectures within Plasmodium species and with other species, described in UniProt.ConclusionsPlasmobase is a valuable new resource for domain annotation in Plasmodium genomes. Its graphical presentation of protein sequences, based on domain architectures, will hopefully be of interest for comparative genomic studies. It should help to discover species-specific genes, possibly underlying important phenotypic differences between parasites, and orthologous gene families for deciphering the biology of these complex and important Apicomplexan organisms. In conclusion, Plasmobase is a flexible and rich site where any biologist can find something of his/her own interest.Availability Plasmobase is accessible at http://genome.lcqb.upmc.fr/plasmobase/.
Mbio | 2018
Ari Ugarte; Riccardo Vicedomini; Juliana S. Bernardes; Alessandra Carbone
BackgroundBiochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. The new landscape we face requires the reconstruction of biochemical and regulatory pathways at the community level in a given environment. In order to understand how environmental factors affect the genetic material and the dynamics of the expression from one environment to another, we want to evaluate the quantity of gene protein sequences or transcripts associated to a given pathway by precisely estimating the abundance of protein domains, their weak presence or absence in environmental samples.ResultsMetaCLADE is a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE is applied to simulated data and to more than ten metagenomic and metatranscriptomic datasets from different environments where it outperforms InterProScan in the number of annotated domains. It is compared to the state-of-the-art non-profile-based and profile-based methods, UProC and HMM-GRASPx, showing complementary predictions to UProC. A combination of MetaCLADE and UProC improves even further the functional annotation of environmental samples.ConclusionsLearning about the functional activity of environmental microbial communities is a crucial step to understand microbial interactions and large-scale environmental impact. MetaCLADE has been explicitly designed for metagenomic and metatranscriptomic data and allows for the discovery of patterns in divergent sequences, thanks to its multi-source strategy. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes and human tissues.
BMC Bioinformatics | 2011
Juliana S. Bernardes; Alessandra Carbone; Gerson Zaverucha
PLOS Computational Biology | 2018
Nika Abdollahi; Alexandre Albani; Eric Anthony; Agnes Baud; Mélissa Cardon; Robert Clerc; Dariusz Czernecki; Romain Conte; Laurent David; Agathe Delaune; Samia Djerroud; Pauline Fourgoux; Nadège Guiglielmoni; Jeanne Laurentie; Nathalie Lehmann; Camille Lochard; Rémi Montagne; Vasiliki Myrodia; Vaitea Opuu; Elise Parey; Lélia Polit; Sylvain Privé; Chloé Quignot; Maria Ruiz-Cuevas; Mariam Sissoko; Nicolas Sompairac; Audrey Vallerix; Violaine Verrecchia; Marc Delarue; Raphaël Guerois