Ioannis Filippis
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ioannis Filippis.
Journal of Molecular Biology | 2014
Christopher M. Yates; Ioannis Filippis; Lawrence A. Kelley; Michael J. E. Sternberg
Whole-genome and exome sequencing studies reveal many genetic variants between individuals, some of which are linked to disease. Many of these variants lead to single amino acid variants (SAVs), and accurate prediction of their phenotypic impact is important. Incorporating sequence conservation and network-level features, we have developed a method, SuSPect (Disease-Susceptibility-based SAV Phenotype Prediction), for predicting how likely SAVs are to be associated with disease. SuSPect performs significantly better than other available batch methods on the VariBench benchmarking dataset, with a balanced accuracy of 82%. SuSPect is available at www.sbg.bio.ic.ac.uk/suspect. The Web site has been implemented in Perl and SQLite and is compatible with modern browsers. An SQLite database of possible missense variants in the human proteome is available to download at www.sbg.bio.ic.ac.uk/suspect/download.html.
Nucleic Acids Research | 2012
Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Alison L. Cuff; Jose M. Dana; Ioannis Filippis; Julian Gough; Sarah Hunter; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Alex L. Mitchell; Alexey G. Murzin; Bernardo Ochoa-Montaño; Owen J. L. Rackham; James C. Smith; Michael J. E. Sternberg; Sameer Velankar; Corin Yeats; Christine A. Orengo
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
BMC Bioinformatics | 2010
Jose M. Duarte; Rajagopal Sathyapriya; Henning Stehr; Ioannis Filippis; Michael Lappe
BackgroundContact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a proteins fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the models simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact maps biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure.ResultsWe use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβatoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity.ConclusionsThus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
PLOS ONE | 2009
Tijana Milenkovic; Ioannis Filippis; Michael Lappe; Nataša Pržulj
Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in protein structure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for protein structure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models.
PLOS Computational Biology | 2009
Rajagopal Sathyapriya; Jose M. Duarte; Henning Stehr; Ioannis Filippis; Michael Lappe
The network of native non-covalent residue contacts determines the three-dimensional structure of a protein. However, not all contacts are of equal structural significance, and little knowledge exists about a minimal, yet sufficient, subset required to define the global features of a protein. Characterisation of this “structural essence” has remained elusive so far: no algorithmic strategy has been devised to-date that could outperform a random selection in terms of 3D reconstruction accuracy (measured as the Ca RMSD). It is not only of theoretical interest (i.e., for design of advanced statistical potentials) to identify the number and nature of essential native contacts—such a subset of spatial constraints is very useful in a number of novel experimental methods (like EPR) which rely heavily on constraint-based protein modelling. To derive accurate three-dimensional models from distance constraints, we implemented a reconstruction pipeline using distance geometry. We selected a test-set of 12 protein structures from the four major SCOP fold classes and performed our reconstruction analysis. As a reference set, series of random subsets (ranging from 10% to 90% of native contacts) are generated for each protein, and the reconstruction accuracy is computed for each subset. We have developed a rational strategy, termed “cone-peeling” that combines sequence features and network descriptors to select minimal subsets that outperform the reference sets. We present, for the first time, a rational strategy to derive a structural essence of residue contacts and provide an estimate of the size of this minimal subset. Our algorithm computes sparse subsets capable of determining the tertiary structure at approximately 4.8 Å Ca RMSD with as little as 8% of the native contacts (Ca-Ca and Cb-Cb). At the same time, a randomly chosen subset of native contacts needs about twice as many contacts to reach the same level of accuracy. This “structural essence” opens new avenues in the fields of structure prediction, empirical potentials and docking.
Nucleic Acids Research | 2015
Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Domenico Cozzetto; Jose M. Dana; Ioannis Filippis; Julian Gough; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Jaina Mistry; Alexey G. Murzin; Bernardo Ochoa-Montaño; Matt E. Oates; Marco Punta; Owen J. L. Rackham; Jonathan Stahlhacke; Michael J. E. Sternberg; Sameer Velankar; Christine A. Orengo
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3Ds SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.
Structure | 2017
Joe G. Greener; Ioannis Filippis; Michael J. E. Sternberg
Summary The related concepts of protein dynamics, conformational ensembles and allostery are often difficult to study with molecular dynamics (MD) due to the timescales involved. We present ExProSE (Exploration of Protein Structural Ensembles), a distance geometry-based method that generates an ensemble of protein structures from two input structures. ExProSE provides a unified framework for the exploration of protein structure and dynamics in a fast and accessible way. Using a dataset of apo/holo pairs it is shown that existing coarse-grained methods often cannot span large conformational changes. For T4-lysozyme, ExProSE is able to generate ensembles that are more native-like than tCONCOORD and NMSim, and comparable with targeted MD. By adding additional constraints representing potential modulators, ExProSE can predict allosteric sites. ExProSE ranks an allosteric pocket first or second for 27 out of 58 allosteric proteins, which is similar and complementary to existing methods. The ExProSE source code is freely available.
Plant Journal | 2013
Ioannis Filippis; Rosa Lopez-Cobollo; James Abbott; Sarah Butcher; Gerard J. Bishop
Plant organs are made from multiple cell types, and defining the expression level of a gene in any one cell or group of cells from a complex mixture is difficult. Dicotyledonous plants normally have three distinct layers of cells, L1, L2 and L3. Layer L1 is the single layer of cells making up the epidermis, layer L2 the single cell sub-epidermal layer and layer L3 constitutes the rest of the internal cells. Here we show how it is possible to harvest an organ and characterise the level of layer-specific expression by using a periclinal chimera that has its L1 layer from Solanum pennellii and its L2 and L3 layers from Solanum lycopersicum. This is possible by measuring the level of the frequency of species-specific transcripts. RNA-seq analysis enabled the genome-wide assessment of whether a gene is expressed in the L1 or L2/L3 layers. From 13 277 genes that are expressed in both the chimera and the parental lines and with at least one polymorphism between the parental alleles, we identified 382 genes that are preferentially expressed in L1 in contrast to 1159 genes in L2/L3. Gene ontology analysis shows that many genes preferentially expressed in L1 are involved in cutin and wax biosynthesis, whereas numerous genes that are preferentially expressed in L2/L3 tissue are associated with chloroplastic processes. These data indicate the use of such chimeras and provide detailed information on the level of layer-specific expression of genes.
Philosophical Transactions of the Royal Society A | 2012
Jeremy Cohen; Ioannis Filippis; Mark Woodbridge; Daniela Bauer; Neil Chue Hong; Mike Jackson; Sarah Butcher; David Colling; John Darlington; Brian Fuchs; M. J. Harvey
Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
Current Opinion in Biotechnology | 2009
Michael Lappe; Ganesh Bagler; Ioannis Filippis; Henning Stehr; Jose M. Duarte; Rajagopal Sathyapriya
Novel high-throughput technologies for directed evolution enable experimental coverage of an impressive number of sequences. Nevertheless, the success of such experiments hinges on the initial sequence libraries. Here we consider the computational design of smart focused libraries and review insights from experimental strategies and theoretic advances in modelling their energy landscapes. In library design as in structure prediction, the applied energy function is the key. Current knowledge-based potentials have proven more successful than purely physics-based ones. Here we summarize novel approaches that extend the classical pairwise treatment of residue contacts towards adaptive knowledge-based multi-body potentials. We suggest that minimal sets of probabilistic constraints will lead to much more efficient sampling of permissible conformations and sequence space.