Vanessa E. Gray
University of Washington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vanessa E. Gray.
Nature Methods | 2012
Sudhir Kumar; Maxwell Sanderford; Vanessa E. Gray; Jieping Ye; Li Liu
The lack of a common file format has been a significant barrier to the effective sharing of software tools and analysis techniques. As a solution, CXIDB is standardized on CXI files, which are based on the HDF5 format (http://hdfgroup.org/). In a CXI file, every measurement is represented by an entry that aims to contain all the required information for its measurement’s interpretation (Fig. 1d). The entries contain several defined standardized groups that store the data along with metadata such as experimental conditions and instruments. The dictionary of defined groups can be extended to accommodate new types of metadata, and additional data (such as notebooks) can be added to the corresponding CXIDB entry in auxiliary files. Data depositors are encouraged to list publications that can be used both to cite the data entry and to document how the data were collected and processed. CXIDB includes a catalog of publicly available software to provide newcomers with a list of useful resources. The availability of high-quality software will be crucial for the progress of coherent X-ray imaging, just as it was for X-ray crystallography. The Protein Data Bank is a remarkable manifestation of such a development. At the moment, the CXIDB stores about 6.8 million images, which represents a mere 16 hours of operation of the LCLS. With 10 million shots per day possible, and a few billion shots per day expected at the European X-ray Free-electron Laser, significant expansion of the data bank is anticipated. The data bank is open for deposition to anyone. To deposit data, contact cxidb@cxidb. org or visit http://cxidb.org/.
Bioinformatics | 2012
Vanessa E. Gray; Kimberly R. Kukurba; Sudhir Kumar
Summary: Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10 000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10 913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change. Contact: [email protected]
Molecular Biology and Evolution | 2011
Vanessa E. Gray; Sudhir Kumar
Posttranslational modifications (PTMs) are chemical alterations that are critical to protein conformation and activation states. Despite their functional importance and reported involvement in many diseases, evolutionary analyses have produced enigmatic results because only weak or no selective pressures have been attributed to many types of PTMs. In a large-scale analysis of 16,836 PTM positions from 4,484 human proteins, we find that positions harboring PTMs show evidence of higher purifying selection in 70% of the phosphorylated and N-linked glycosylated proteins. The purifying selection is up to 42% more severe at PTM residues as compared with the corresponding unmodified amino acids. These results establish extensive selective pressures in the long-term history of positions that experience PTMs in the human proteins. Our findings will enhance our understanding of the historical function of PTMs over time and help in predicting PTM positions by using evolutionary comparisons.
Genetics | 2017
Vanessa E. Gray; Ronald J. Hause; Douglas M. Fowler
Mutagenesis is a widely used method for identifying protein positions that are important for function or ligand binding. Advances in high-throughput DNA sequencing and mutagenesis techniques have enabled measurement of the effects of nearly all possible amino acid substitutions in many proteins. The resulting large-scale mutagenesis data sets offer a unique opportunity to draw general conclusions about the effects of different amino acid substitutions. Thus, we analyzed 34,373 mutations in 14 proteins whose effects were measured using large-scale mutagenesis approaches. Methionine was the most tolerated substitution, while proline was the least tolerated. We found that several substitutions, including histidine and asparagine, best recapitulated the effects of other substitutions, even when the identity of the wild-type amino acid was considered. The effects of histidine and asparagine substitutions also correlated best with the effects of other substitutions in different structural contexts. Furthermore, highly disruptive substitutions like aspartic and glutamic acid had the most discriminatory power for detecting ligand interface positions. Our work highlights the utility of large-scale mutagenesis data, and our conclusions can help guide future single substitution mutational scans.
Molecular Biology and Evolution | 2016
Li Liu; Koichiro Tamura; Maxwell Sanderford; Vanessa E. Gray; Sudhir Kumar
Widespread sequencing efforts are revealing unprecedented amount of genomic variation in populations. Such information is routinely used to derive consensus reference sequences and to infer positions subject to natural selection. Here, we present a new molecular evolutionary method for estimating neutral evolutionary probabilities (EPs) of each amino acid, or nucleotide state at a genomic position without using intraspecific polymorphism data. Because EPs are derived independently of population-level information, they serve as null expectations that can be used to evaluate selective forces on alleles at both polymorphic and monomorphic positions in populations. We applied this method to coding sequences in the human genome and produced a comprehensive evolutionary variome reference for all human proteins. We found that EPs accurately predict neutral and disease-associated alleles. Through an analysis of discordance between allelic EPs and their observed population frequencies, we discovered thousands of novel candidate sites for nonneutral evolution in human proteins. Many of these were validated in a joint analysis of disease-associated variants and population data. The EP method is also directly applicable to the analysis of noncoding sequences and genomic analyses of nonmodel species.
Molecular Biology and Evolution | 2014
Vanessa E. Gray; Li Liu; Ronika Nirankari; Peter Hornbeck; Sudhir Kumar
Posttranslational modifications (PTMs) regulate molecular structures and functions of proteins by covalently binding to amino acids. Hundreds of thousands of PTMs have been reported for the human proteome, with multiple PTMs known to affect tens of thousands of lysine (K) residues. Our molecular evolutionary analyses show that K residues with multiple PTMs exhibit greater conservation than those with a single PTM, but the difference is rather small. In contrast, short-term evolutionary trends revealed in an analysis of human population variation exhibited a much larger difference. Lysine residues with three PTMs show 1.8-fold enrichment of Mendelian disease-associated variants when compared with K residues with two PTMs, with the latter showing 1.7-fold enrichment of these variants when compared with the K residues with one PTM. Rare polymorphisms in humans show a similar trend, which suggests much greater negative selection against mutations of K residues with multiple PTMs within population. Conversely, common polymorphisms are overabundant at unmodified K residues and at K residues with fewer PTMs. The observed difference between inter- and intraspecies patterns of purifying selection on residues with PTMs suggests extensive species-specific drifting of PTM positions. These results suggest that the functionality of a protein is likely conserved, without necessarily conserving the PTM positions over evolutionary time.
Cell systems | 2017
Vanessa E. Gray; Ronald J. Hause; Jens Luebeck; Jay Shendure; Douglas M. Fowler
Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variants molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envisions performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/).
Nature Genetics | 2018
Kenneth A. Matreyek; Lea M. Starita; Jason J. Stephany; Beth Martin; Melissa A. Chiasson; Vanessa E. Gray; Martin Kircher; Arineh Khechaduri; Jennifer N. Dines; Ronald J. Hause; Smita Bhatia; William E. Evans; Mary V. Relling; Wenjian Yang; Jay Shendure; Douglas M. Fowler
Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.VAMP-seq is a scalable assay that measures the effects of missense variants on intracellular protein abundance. Applying VAMP-seq to thousands of PTEN and TPMT variants helps to classify them as pathogenic or benign.
PLOS ONE | 2013
Mia D. Champion; Vanessa E. Gray; Carl F Eberhard; Sudhir Kumar
The evolution of resistance in Staphylococcus aureus occurs rapidly, and in response to all known antimicrobial treatments. Numerous studies of model species describe compensatory roles of mutations in mediating competitive fitness, and there is growing evidence that these mutation types also drive adaptation of S. aureus strains. However, few studies have tracked amino acid changes during the complete evolutionary trajectory of antibiotic adaptation or been able to predict their functional relevance. Here, we have assessed the efficacy of computational methods to predict biological resistance of a collection of clinically known Resistance Associated Mutations (RAMs). We have found that >90% of known RAMs are incorrectly predicted to be functionally neutral by at least one of the prediction methods used. By tracing the evolutionary histories of all of the false negative RAMs, we have discovered that a significant number are reversion mutations to ancestral alleles also carried in the MSSA476 methicillin-sensitive isolate. These genetic reversions are most prevalent in strains following daptomycin treatment and show a tendency to accumulate in biological pathway reactions that are distinct from those accumulating non-reversion mutations. Our studies therefore show that in addition to non-reversion mutations, reversion mutations arise in isolates exposed to new antibiotic treatments. It is possible that acquisition of reversion mutations in the genome may prevent substantial fitness costs during the progression of resistance. Our findings pose an interesting question to be addressed by further clinical studies regarding whether or not these reversion mutations lead to a renewed vulnerability of a vancomycin or daptomycin resistant strain to antibiotics administered at an earlier stage of infection.
bioRxiv | 2017
Vanessa E. Gray; Ronald J. Hause; Douglas M. Fowler
Alanine scanning mutagenesis is a widely-used method for identifying protein positions that are important for function or ligand binding. Alanine was chosen because it is physicochemically innocuous and constitutes a deletion of the side chain at the β- carbon. Alanine is also thought to best represent the effects of other mutations; however, this assumption has not been formally tested. To determine whether alanine substitutions are always the best choice, we analyzed 34,373 mutations in fourteen proteins whose effects were measured using large-scale mutagenesis approaches. We found that several substitutions, including histidine and asparagine, are better at recapitulating the effects of other substitutions. Histidine and asparagine also correlated best with the effects of other substitutions in different structural contexts. Furthermore, we found that alanine is among the worst substitutions for detecting ligand interface positions, despite its frequent use for this purpose. Our work highlights the utility of large-scale mutagenesis data and can help to guide future single substitution mutational scans.