Pablo Mier | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pablo Mier is active.

Explore More

Publication

Featured researches published by Pablo Mier.

Journal of Computational Biology | 2016

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases

Pablo Mier; Miguel A. Andrade-Navarro

The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50 reference proteomes, along with the SwissProt database, which we could compress by 74.7%. The clustering algorithm was benchmarked using OrthoBench and compared with FASTA HERDER, a previous version of the algorithm, showing that FastaHerder2 can cluster a set of proteins yielding a high compression, with a lower error rate than its predecessor. We illustrate the use of FastaHerder2 to detect biologically relevant functional features in protein families. With our approach we seek to promote a modern view and usage of the protein sequence databases more appropriate to the postgenomic era.

Scientific Reports | 2016

Efficient embedding of complex networks to hyperbolic space via their Laplacian

Gregorio Alanis-Lobato; Pablo Mier; Miguel A. Andrade-Navarro

The different factors involved in the growth process of complex networks imprint valuable information in their observable topologies. How to exploit this information to accurately predict structural network changes is the subject of active research. A recent model of network growth sustains that the emergence of properties common to most complex systems is the result of certain trade-offs between node birth-time and similarity. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract this optimisation process. Current methods for network hyperbolic embedding search for node coordinates that maximise the likelihood that the network was produced by the afore-mentioned model. Here, a different strategy is followed in the form of the Laplacian-based Network Embedding, a simple yet accurate, efficient and data driven manifold learning approach, which allows for the quick geometric analysis of big networks. Comparisons against existing embedding and prediction techniques highlight its applicability to network evolution and link prediction.

PLOS ONE | 2017

The Protein Structure Context of PolyQ Regions.

Franziska Totzeck; Miguel A. Andrade-Navarro; Pablo Mier

Proteins containing glutamine repeats (polyQ) are known to be structurally unstable. Abnormal expansion of polyQ in some proteins exceeding a certain threshold leads to neurodegenerative disease, a symptom of which are protein aggregates. This has led to extensive research of the structure of polyQ stretches. However, the accumulation of contradictory results suggests that protein context might be of importance. Here we aimed to evaluate the structural context of polyQ regions in proteins by analysing the secondary structure of polyQ proteins and their homologs. The results revealed that the secondary structure in polyQ vicinity is predominantly random coil or helix. Importantly, the regions surrounding the polyQ are often not solved in 3D structures. In the few cases where the point of insertion of the polyQ was mapped to a full protein, we observed that these are always located in the surface of the protein. The findings support the hypothesis that polyQ might serve to extend coiled coils at their C-terminus in highly disordered regions involved in protein-protein interactions.

Proteins | 2017

Context characterization of amino acid homorepeats using evolution, position, and order

Pablo Mier; Gregorio Alanis-Lobato; Miguel A. Andrade-Navarro

Amino acid repeats, or homorepeats, are low complexity protein motifs consisting of tandem repetitions of a single amino acid. Their presence and relative number vary in different proteomes, and some studies have tried to address this variation, proteome by proteome. In this work, we present a full characterization of amino acid homorepeats across evolution. We studied the presence and differential usage of each possible homorepeat in proteomes from various taxonomic groups, using clusters of very similar proteins to eliminate redundancy. The position of each amino acid repeat within proteins, and the order of co‐occurring amino acid repeats were also addressed. As a result, we present evidence about the unevenly evolution of homorepeats, as well as the functional implications of their relative position in proteins. We discuss some of these cases in their taxonomic context. Collectively, our results show evolutionary and positional signals that suggest that homorepeats have biological function, likely creating unspecific protein interactions or modulating specific interactions in a context dependent manner. In conclusion, our work supports the functional importance of homorepeats and establishes a basis for the study of other low complexity repeats. Proteins 2017; 85:709–719.

Applied Network Science | 2016

Manifold learning and maximum likelihood estimation for hyperbolic network embedding

Gregorio Alanis-Lobato; Pablo Mier; Miguel A. Andrade-Navarro

The Popularity-Similarity (PS) model sustains that clustering and hierarchy, properties common to most networks representing complex systems, are the result of an optimisation process in which nodes seek to form ties, not only with the most connected (popular) system components, but also with those that are similar to them. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract popularity-similarity trade-offs and the formation of scale-free and strongly clustered networks can be accurately described.Current methods for mapping networks to hyperbolic space are based on maximum likelihood estimations or manifold learning. The former approach is very accurate but slow; the latter improves efficiency at the cost of accuracy. Here, we analyse the strengths and limitations of both strategies and assess the advantages of combining them to efficiently embed big networks, allowing for their examination from a geometric perspective. Our evaluations in artificial and real networks support the idea that hyperbolic distance constraints play a significant role in the formation of edges between nodes. This means that challenging problems in network science, like link prediction or community detection, could be more easily addressed under this geometric framework.

Bioinformatics | 2016

dAPE: a web server to detect homorepeats and follow their evolution

Pablo Mier; Miguel A. Andrade-Navarro

Abstract Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.

Bioinformatics | 2018

The latent geometry of the human protein interaction network

Gregorio Alanis-Lobato; Pablo Mier; Miguel A. Andrade-Navarro

Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperbolic plane offers a novel and informative way to visualize proteins and their interactions. We then used these coordinates to compute hyperbolic distances between proteins, which served as likelihood scores for the prediction of plausible protein interactions. Finally, we observed that proteins can efficiently communicate with each other via a greedy routing process, guided by the latent geometry of the hPIN. We show that these efficient communication channels can be used to determine the core members of signal transduction pathways and to study how system perturbations impact their efficiency. Availability and implementation An R implementation of our network embedder is available at https://github.com/galanisl/NetHypGeom. Also, a web tool for the geometric analysis of the hPIN accompanies this text at http://cbdm-01.zdv.uni-mainz.de/~galanisl/gapi. Supplementary information Supplementary data are available at Bioinformatics online.

Genome Biology and Evolution | 2018

Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length

Pablo Mier; Miguel A. Andrade-Navarro

Abstract Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in unstable-in-length orthologous polyQ regions. This indicates that CAG triplets produce the necessary instability for a glutamine stretch to grow. Proteins related to polyQ-associated diseases behave in a more extreme way, with longer glutamine stretches in human and evolutionarily closer nonhuman primates, and an overall higher CAG usage. In the light of our results, we suggest an evolutionary model to explain the glutamine codon usage in polyQ regions.

Bioinformatics | 2018

Traitpedia: a collaborative effort to gather species traits

Pablo Mier; Miguel A. Andrade-Navarro

Summary Traitpedia is a collaborative database aimed to collect binary traits in a tabular form for a growing number of species. Availability and implementation Traitpedia can be accessed from http://cbdm‐01.zdv.uni‐mainz.de/˜munoz/traitpedia. Supplementary information Supplementary data are available at Bioinformatics online.

BMC Research Notes | 2018

Proteome-wide comparison between the amino acid composition of domains and linkers

Daniel Brüne; Miguel A. Andrade-Navarro; Pablo Mier

ObjectiveAmino acid composition is a sequence feature that has been extensively used to characterize proteomes of many species and protein families. Yet the analysis of amino acid composition of protein domains and the linkers connecting them has received less attention. Here, we perform both a comprehensive full-proteome amino acid composition analysis and a similar analysis focusing on domains and linkers, to uncover domain- or linker-specific differential amino acid usage patterns.ResultsThe amino acid composition in the 38 proteomes studied showcase the greater variability found in archaea and bacteria species compared to eukaryotes. When focusing on domains and linkers, we describe the preferential use of polar residues in linkers and hydrophobic residues in domains. To let any user perform this analysis on a given domain (or set of them), we developed a dedicated R script called RACCOON, which can be easily used and can provide interesting insights into the compositional differences between a domain and its surrounding linkers.

Explore More

Collaboration

Dive into the Pablo Mier's collaboration.

Top Co-Authors

Miguel A. Andrade-Navarro

University of Mainz

View shared research outputs

Top Co-Authors

Gregorio Alanis-Lobato

University of Mainz

View shared research outputs

Top Co-Authors

Daniel Brüne

Heidelberg University

View shared research outputs

Top Co-Authors

Emmanuel G. Reynaud

University College Dublin

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where Pablo Mier is active.

Publication

Featured researches published by Pablo Mier.

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases

Efficient embedding of complex networks to hyperbolic space via their Laplacian

The Protein Structure Context of PolyQ Regions.

Context characterization of amino acid homorepeats using evolution, position, and order

Manifold learning and maximum likelihood estimation for hyperbolic network embedding

dAPE: a web server to detect homorepeats and follow their evolution

The latent geometry of the human protein interaction network

Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length

Traitpedia: a collaborative effort to gather species traits

Proteome-wide comparison between the amino acid composition of domains and linkers

Collaboration

Dive into the Pablo Mier's collaboration.