Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Emidio Capriotti is active.

Publication


Featured researches published by Emidio Capriotti.


Nucleic Acids Research | 2005

I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure

Emidio Capriotti; Piero Fariselli; Rita Casadio

I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodynamic experimental data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, respectively. When predicting ΔΔG values associated with mutations, the correlation of predicted with expected/experimental values is 0.71 (with a standard error of 1.30 kcal/mol) and 0.62 (with a standard error of 1.45 kcal/mol) when structural or sequence information are respectively adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with atomic resolution. Availability: .


Bioinformatics | 2006

Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information

Emidio Capriotti; Remo Calabrese; Rita Casadio

MOTIVATION Human single nucleotide polymorphisms (SNPs) are the most frequent type of genetic variation in human population. One of the most important goals of SNP projects is to understand which human genotype variations are related to Mendelian and complex diseases. Great interest is focused on non-synonymous coding SNPs (nsSNPs) that are responsible of protein single point mutation. nsSNPs can be neutral or disease associated. It is known that the mutation of only one residue in a protein sequence can be related to a number of pathological conditions of dramatic social impact such as Alzheimers, Parkinsons and Creutzfeldt-Jakobs diseases. The quality and completeness of presently available SNPs databases allows the application of machine learning techniques to predict the insurgence of human diseases due to single point protein mutation starting from the protein sequence. RESULTS In this paper, we develop a method based on support vector machines (SVMs) that starting from the protein sequence information can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. Using a dataset of 21 185 single point mutations, 61% of which are disease-related, out of 3587 proteins, we show that our predictor can reach more than 74% accuracy in the specific task of predicting whether a single point mutation can be disease related or not. Our method, although based on less information, outperforms other web-available predictors implementing different approaches. AVAILABILITY A beta version of the web tool is available at http://gpcr.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi


Human Mutation | 2009

Functional annotations improve the predictive score of human disease-related mutations in proteins

Remo Calabrese; Emidio Capriotti; Piero Fariselli; Pier Luigi Martelli; Rita Casadio

Single nucleotide polymorphisms (SNPs) are the simplest and most frequent form of human DNA variation, also valuable as genetic markers of disease susceptibility. The most investigated SNPs are missense mutations resulting in residue substitutions in the protein. Here we propose SNPs&GO, an accurate method that, starting from a protein sequence, can predict whether a mutation is disease related or not by exploiting the protein functional annotation. The scoring efficiency of SNPs&GO is as high as 82%, with a Matthews correlation coefficient equal to 0.63 over a wide set of annotated nonsynonymous mutations in proteins, including 16,330 disease‐related and 17,432 neutral polymorphisms. SNPs&GO collects in unique framework information derived from protein sequence, evolutionary information, and function as encoded in the Gene Ontology terms, and outperforms other available predictive methods. Hum Mutat 30:1–8, 2009.


Nature Structural & Molecular Biology | 2011

The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules

Davide Baù; Amartya Sanyal; Bryan R. Lajoie; Emidio Capriotti; Meg Byron; Jeanne B. Lawrence; Job Dekker; Marc A. Marti-Renom

We developed a general approach that combines chromosome conformation capture carbon copy (5C) with the Integrated Modeling Platform (IMP) to generate high-resolution three-dimensional models of chromatin at the megabase scale. We applied this approach to the ENm008 domain on human chromosome 16, containing the α-globin locus, which is expressed in K562 cells and silenced in lymphoblastoid cells (GM12878). The models accurately reproduce the known looping interactions between the α-globin genes and their distal regulatory elements. Further, we find using our approach that the domain folds into a single globular conformation in GM12878 cells, whereas two globules are formed in K562 cells. The central cores of these globules are enriched for transcribed genes, whereas nontranscribed chromatin is more peripheral. We propose that globule formation represents a higher-order folding state related to clustering of transcribed genes around shared transcription machineries, as previously observed by microscopy.


BMC Bioinformatics | 2008

A three-state prediction of single point mutations on protein stability changes

Emidio Capriotti; Piero Fariselli; Ivan Rossi; Rita Casadio

BackgroundA basic question of protein structural studies is to which extent mutations affect the stability. This question may be addressed starting from sequence and/or from structure. In proteomics and genomics studies prediction of protein stability free energy change (ΔΔG) upon single point mutation may also help the annotation process. The experimental ΔΔG values are affected by uncertainty as measured by standard deviations. Most of the ΔΔG values are nearly zero (about 32% of the ΔΔG data set ranges from −0.5 to 0.5 kcal/mole) and both the value and sign of ΔΔG may be either positive or negative for the same mutation blurring the relationship among mutations and expected ΔΔG value. In order to overcome this problem we describe a new predictor that discriminates between 3 mutation classes: destabilizing mutations (ΔΔG<−1.0 kcal/mol), stabilizing mutations (ΔΔG>1.0 kcal/mole) and neutral mutations (−1.0≤ΔΔG≤1.0 kcal/mole).ResultsIn this paper a support vector machine starting from the protein sequence or structure discriminates between stabilizing, destabilizing and neutral mutations. We rank all the possible substitutions according to a three state classification system and show that the overall accuracy of our predictor is as high as 56% when performed starting from sequence information and 61% when the protein structure is available, with a mean value correlation coefficient of 0.27 and 0.35, respectively. These values are about 20 points per cent higher than those of a random predictor.ConclusionsOur method improves the quality of the prediction of the free energy change due to single point protein mutations by adopting a hypothesis of thermodynamic reversibility of the existing experimental data. By this we both recast the thermodynamic symmetry of the problem and balance the distribution of the available experimental measurements of free energy changes. This eliminates possible overestimations of the previously described methods trained on an unbalanced data set comprising a number of destabilizing mutations higher than stabilizing ones.


Bioinformatics | 2011

Bioinformatics challenges for personalized medicine

Guy Haskin Fernald; Emidio Capriotti; Roxana Daneshjou; Konrad J. Karczewski; Russ B. Altman

Motivation: Widespread availability of low-cost, full genome sequencing will introduce new challenges for bioinformatics. Results: This review outlines recent developments in sequencing technologies and genome analysis methods for application in personalized medicine. New methods are needed in four areas to realize the potential of personalized medicine: (i) processing large-scale robust genomic data; (ii) interpreting the functional effect and the impact of genomic variation; (iii) integrating systems data to relate complex genetic interactions with phenotypes; and (iv) translating these discoveries into medical practice. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


intelligent systems in molecular biology | 2004

A neural-network-based method for predicting protein stability changes upon single point mutations

Emidio Capriotti; Piero Fariselli; Rita Casadio

MOTIVATION One important requirement for protein design is to be able to predict changes of protein stability upon mutation. Different methods addressing this task have been described and their performance tested considering global linear correlation between predicted and experimental data. Neither is direct statistical evaluation of their prediction performance available, nor is a direct comparison among different approaches possible. Recently, a significant database of thermodynamic data on protein stability changes upon single point mutation has been generated (ProTherm). This allows the application of machine learning techniques to predicting free energy stability changes upon mutation starting from the protein sequence. RESULTS In this paper, we present a neural-network-based method to predict if a given mutation increases or decreases the protein thermodynamic stability with respect to the native structure. Using a dataset consisting of 1615 mutations, our predictor correctly classifies >80% of the mutations in the database. On the same task and using the same data, our predictor performs better than other methods available on the Web. Moreover, when our system is coupled with energy-based methods, the joint prediction accuracy increases up to 90%, suggesting that it can be used to increase also the performance of pre-existing methods, and generally to improve protein design strategies. AVAILABILITY The server is under construction and will be available at http://www.biocomp.unibo.it


european conference on computational biology | 2005

Predicting protein stability changes from sequences using support vector machines

Emidio Capriotti; Piero Fariselli; Remo Calabrese; Rita Casadio

MOTIVATION The prediction of protein stability change upon mutations is key to understanding protein folding and misfolding. At present, methods are available to predict stability changes only when the atomic structure of the protein is available. Methods addressing the same task starting from the protein sequence are, however, necessary in order to complete genome annotation, especially in relation to single nucleotide polymorphisms (SNPs) and related diseases. RESULTS We develop a method based on support vector machines that, starting from the protein sequence, predicts the sign and the value of free energy stability change upon single point mutation. We show that the accuracy of our predictor is as high as 77% in the specific task of predicting the DeltaDeltaG sign related to the corresponding protein stability. When predicting the DeltaDeltaG values, a satisfactory correlation agreement with the experimental data is also found. As a final blind benchmark, the predictor is applied to proteins with a set of disease-related SNPs, for which thermodynamic data are also known. We found that our predictions corroborate the view that disease-related mutations correspond to a decrease in protein stability. AVAILABILITY http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi


PLOS Genetics | 2011

Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence

Frederick E. Dewey; Rong Chen; Sergio Cordero; Kelly E. Ormond; Colleen Caleshu; Konrad J. Karczewski; Michelle Whirl-Carrillo; Matthew T. Wheeler; Joel T. Dudley; Jake K. Byrnes; Omar E. Cornejo; Joshua W. Knowles; Mark Woon; Li Gong; Caroline F. Thorn; Joan M. Hebert; Emidio Capriotti; Sean P. David; Aleksandra Pavlovic; Anne West; Joseph V. Thakuria; Madeleine Ball; Alexander Wait Zaranek; Heidi L. Rehm; George M. Church; John West; Carlos Bustamante; Michael Snyder; Russ B. Altman; Teri E. Klein

Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (<1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.


BMC Genomics | 2013

Collective judgment predicts disease-associated single nucleotide variants

Emidio Capriotti; Russ B. Altman; Yana Bromberg

BackgroundIn recent years the number of human genetic variants deposited into the publicly available databases has been increasing exponentially. The latest version of dbSNP, for example, contains ~50 million validated Single Nucleotide Variants (SNVs). SNVs make up most of human variation and are often the primary causes of disease. The non-synonymous SNVs (nsSNVs) result in single amino acid substitutions and may affect protein function, often causing disease. Although several methods for the detection of nsSNV effects have already been developed, the consistent increase in annotated data is offering the opportunity to improve prediction accuracy.ResultsHere we present a new approach for the detection of disease-associated nsSNVs (Meta-SNP) that integrates four existing methods: PANTHER, PhD-SNP, SIFT and SNAP. We first tested the accuracy of each method using a dataset of 35,766 disease-annotated mutations from 8,667 proteins extracted from the SwissVar database. The four methods reached overall accuracies of 64%-76% with a Matthews correlation coefficient (MCC) of 0.38-0.53. We then used the outputs of these methods to develop a machine learning based approach that discriminates between disease-associated and polymorphic variants (Meta-SNP). In testing, the combined method reached 79% overall accuracy and 0.59 MCC, ~3% higher accuracy and ~0.05 higher correlation with respect to the best-performing method. Moreover, for the hardest-to-define subset of nsSNVs, i.e. variants for which half of the predictors disagreed with the other half, Meta-SNP attained 8% higher accuracy than the best predictor.ConclusionsHere we find that the Meta-SNP algorithm achieves better performance than the best single predictor. This result suggests that the methods used for the prediction of variant-disease associations are orthogonal, encoding different biologically relevant relationships. Careful combination of predictions from various resources is therefore a good strategy for the selection of high reliability predictions. Indeed, for the subset of nsSNVs where all predictors were in agreement (46% of all nsSNVs in the set), our method reached 87% overall accuracy and 0.73 MCC. Meta-SNP server is freely accessible at http://snps.biofold.org/meta-snp.

Collaboration


Dive into the Emidio Capriotti's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yana Bromberg

University of Alabama at Birmingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge