Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Juliette Martin is active.

Publication


Featured researches published by Juliette Martin.


BMC Structural Biology | 2005

Protein secondary structure assignment revisited: a detailed analysis of different assignment methods

Juliette Martin; Guillaume Letellier; Antoine Marin; Jean-François Taly; Alexandre G. de Brevern; Jean-François Gibrat

BackgroundA number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a twilight zone where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure.MethodsTo address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures).ResultsA detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior.ConclusionOur method provides valuable assignments which favor the regularity of secondary structure segments.


BMC Structural Biology | 2006

Analysis of an optimal hidden Markov model for secondary structure prediction

Juliette Martin; Jean-François Gibrat; François Rodolphe

BackgroundSecondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.ResultsOur HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model α-helices, 12 that model coil and 9 that model β-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.ConclusionThe hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.


Evolution | 2014

PHENOTYPIC AND GENOTYPIC CONVERGENCES ARE INFLUENCED BY HISTORICAL CONTINGENCY AND ENVIRONMENT IN YEAST

Aymé Spor; Daniel J. Kvitek; Thibault Nidelet; Juliette Martin; Judith Legrand; Christine Dillmann; Aurélie Bourgais; Dominique de Vienne; Gavin Sherlock; Delphine Sicard

Different organisms have independently and recurrently evolved similar phenotypic traits at different points throughout history. This phenotypic convergence may be caused by genotypic convergence and in addition, constrained by historical contingency. To investigate how convergence may be driven by selection in a particular environment and constrained by history, we analyzed nine life‐history traits and four metabolic traits during an experimental evolution of six yeast strains in four different environments. In each of the environments, the population converged toward a different multivariate phenotype. However, the evolution of most traits, including fitness components, was constrained by history. Phenotypic convergence was partly associated with the selection of mutations in genes involved in the same pathway. By further investigating the convergence in one gene, BMH1, mutated in 20% of the evolved populations, we show that both the history and the environment influenced the types of mutations (missense/nonsense), their location within the gene itself, as well as their effects on multiple traits. However, these effects could not be easily predicted from ancestors’ phylogeny or past selection. Combined, our data highlight the role of pleiotropy and epistasis in shaping a rugged fitness landscape.


PLOS ONE | 2010

Classification of Protein Kinases on the Basis of Both Kinase and Non-Kinase Regions

Juliette Martin; Krishanpal Anamika; Narayanaswamy Srinivasan

Background Protein phosphorylation is a generic way to regulate signal transduction pathways in all kingdoms of life. In many organisms, it is achieved by the large family of Ser/Thr/Tyr protein kinases which are traditionally classified into groups and subfamilies on the basis of the amino acid sequence of their catalytic domains. Many protein kinases are multi-domain in nature but the diversity of the accessory domains and their organization are usually not taken into account while classifying kinases into groups or subfamilies. Methodology Here, we present an approach which considers amino acid sequences of complete gene products, in order to suggest refinements in sets of pre-classified sequences. The strategy is based on alignment-free similarity scores and iterative Area Under the Curve (AUC) computation. Similarity scores are computed by detecting common patterns between two sequences and scoring them using a substitution matrix, with a consistent normalization scheme. This allows us to handle full-length sequences, and implicitly takes into account domain diversity and domain shuffling. We quantitatively validate our approach on a subset of 212 human protein kinases. We then employ it on the complete repertoire of human protein kinases and suggest few qualitative refinements in the subfamily assignment stored in the KinG database, which is based on catalytic domains only. Based on our new measure, we delineate 37 cases of potential hybrid kinases: sequences for which classical classification based entirely on catalytic domains is inconsistent with the full-length similarity scores computed here, which implicitly consider multi-domain nature and regions outside the catalytic kinase domain. We also provide some examples of hybrid kinases of the protozoan parasite Entamoeba histolytica. Conclusions The implicit consideration of multi-domain architectures is a valuable inclusion to complement other classification schemes. The proposed algorithm may also be employed to classify other families of enzymes with multi-domain architecture.


BMC Bioinformatics | 2010

Mining protein loops using a structural alphabet and statistical exceptionality

Leslie Regad; Juliette Martin; Gregory Nuel; Anne-Claude Camproux

BackgroundProtein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied.ResultsWe developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints.ConclusionsWe developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.


BMC Biophysics | 2012

Arbitrary protein-protein docking targets biologically relevant interfaces.

Juliette Martin; Richard Lavery

BackgroundProtein-protein recognition is of fundamental importance in the vast majority of biological processes. However, it has already been demonstrated that it is very hard to distinguish true complexes from false complexes in so-called cross-docking experiments, where binary protein complexes are separated and the isolated proteins are all docked against each other and scored. Does this result, at least in part, reflect a physical reality? False complexes could reflect possible nonspecific or weak associations.ResultsIn this paper, we investigate the twilight zone of protein-protein interactions, building on an interesting outcome of cross-docking experiments: false complexes seem to favor residues from the true interaction site, suggesting that randomly chosen partners dock in a non-random fashion on protein surfaces. Here, we carry out arbitrary docking of a non-redundant data set of 198 proteins, with more than 300 randomly chosen probe proteins. We investigate the tendency of arbitrary partners to aggregate at localized regions of the protein surfaces, the shape and compositional bias of the generated interfaces, and the potential of this property to predict biologically relevant binding sites. We show that the non-random localization of arbitrary partners after protein-protein docking is a generic feature of protein structures. The interfaces generated in this way are not systematically planar or curved, but tend to be closer than average to the center of the proteins. These results can be used to predict biological interfaces with an AUC value up to 0.69 alone, and 0.72 when used in combination with evolutionary information. An appropriate choice of random partners and number of docking models make this method computationally practical. It is also noted that nonspecific interfaces can point to alternate interaction sites in the case of proteins with multiple interfaces. We illustrate the usefulness of arbitrary docking using PEBP (Phosphatidylethanolamine binding protein), a kinase inhibitor with multiple partners.ConclusionsAn approach using arbitrary docking, and based solely on physical properties, can successfully identify biologically pertinent protein interfaces.


Algorithms for Molecular Biology | 2010

Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

Gregory Nuel; Leslie Regad; Juliette Martin; Anne-Claude Camproux

BackgroundIn bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models.ResultsThe latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence.ConclusionsOur algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.


PLOS Computational Biology | 2010

Beauty Is in the Eye of the Beholder: Proteins Can Recognize Binding Sites of Homologous Proteins in More than One Way

Juliette Martin

Understanding the mechanisms of protein–protein interaction is a fundamental problem with many practical applications. The fact that different proteins can bind similar partners suggests that convergently evolved binding interfaces are reused in different complexes. A set of protein complexes composed of non-homologous domains interacting with homologous partners at equivalent binding sites was collected in 2006, offering an opportunity to investigate this point. We considered 433 pairs of protein–protein complexes from the ABAC database (AB and AC binary protein complexes sharing a homologous partner A) and analyzed the extent of physico-chemical similarity at the atomic and residue level at the protein–protein interface. Homologous partners of the complexes were superimposed using Multiprot, and similar atoms at the interface were quantified using a five class grouping scheme and a distance cut-off. We found that the number of interfacial atoms with similar properties is systematically lower in the non-homologous proteins than in the homologous ones. We assessed the significance of the similarity by bootstrapping the atomic properties at the interfaces. We found that the similarity of binding sites is very significant between homologous proteins, as expected, but generally insignificant between the non-homologous proteins that bind to homologous partners. Furthermore, evolutionarily conserved residues are not colocalized within the binding sites of non-homologous proteins. We could only identify a limited number of cases of structural mimicry at the interface, suggesting that this property is less generic than previously thought. Our results support the hypothesis that different proteins can interact with similar partners using alternate strategies, but do not support convergent evolution.


IEEE Intelligent Systems | 2005

Choosing the optimal hidden Markov model for secondary-structure prediction

Juliette Martin; Jean-François Gibrat; François Rodolphe

Proteins are major constituents of living cells, forming many cellular components and most enzymes. So, knowledge of 3D protein structures is essential to understand biological mechanisms. Researchers often use neural networks to predict secondary structure in proteins, but the networks can be hard to interpret. This alternative method uses an optimal and interpretable hidden Markov model to classify protein residues. These HMM models account for the transitions observed in 3D structures and allow a predictive approach. Weve developed a method for finding an optimal HMM to classify residues into secondary-structure classes. HMMs both provide a probabilistic framework for sequence treatment and produce interpretable models.


BMC Structural Biology | 2008

Structural deformation upon protein-protein interaction: A structural alphabet approach

Juliette Martin; Leslie Regad; Hélène Lecornet; Anne-Claude Camproux

BackgroundIn a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding.ResultsIn this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins.ConclusionOur study provides qualitative information about induced fit. These results could be of help for flexible docking.

Collaboration


Dive into the Juliette Martin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-François Gibrat

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar

François Rodolphe

Institut national de la recherche agronomique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mutharasu Gnanavel

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar

Prachi Mehrotra

Indian Institute of Science

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge