Evan S. Snitkin
Boston University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Evan S. Snitkin.
Science Translational Medicine | 2012
Evan S. Snitkin; Adrian M. Zelazny; Pamela J. Thomas; Frida Stock; David K. Henderson; Tara N. Palmore; Julia A. Segre
Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing revealed its origin and probable modes of transmission. A Detective Story Some infections are largely a thing of the past—plague, syphilis. The unfortunate result of these antibiotic-driven successes is the emergence of drug-resistant pathogens. And, ironically enough, hospitals are at the center of the problem. An example of this occurred in 2011 at the Clinical Center of the U.S. National Institutes of Health (NIH), in which an outbreak of drug-resistant Klebsiella pneumoniae infected 18 patients, causing the death of 6 of them. Using a combination of whole-genome sequencing and patient tracking, Snitkin and his colleagues examined how the bacteria was spreading through the hospital. The results outline a complicated path of transmission within the hospital that defied standard containment methods, yielding lessons for the future. A patient known to be infected with a drug-resistant form of K. pneumoniae was admitted to the NIH Clinical Center on 13 June 2011. Enhanced isolation procedures were immediately implemented, and no spread of the bacteria was seen for the month she was in the hospital. Although all seemed well, a few weeks later on August 5th, a second infected patient was discovered, followed by a series of other patients with infection or colonization—about 1 a week to a total of 18 by the end of 2011. Six people ultimately died as a result of the bacteria. The outbreak was finally contained by rigorous control procedures. A careful survey of the bed locations of each patient did not shed much light on how the bacteria traveled on its deadly path: The first patient did not even come into contact with any of the others. So the authors performed whole-genome sequencing on all of the bacteria that were found, determining the most likely evolutionary relationships among them by comparing the variations at single nucleotides that arise as bacteria grow. Combining this evolutionary information with the physical tracking of the patients pointed to the most likely transmission scenario. The authors concluded that all of the K. pneumoniae cases likely originated with the index patient, from at least two different sites on her body, rather than by independently introduced bacteria. There were at least three different initial transmission events. Particularly disturbing was the fact that one of the infections could be linked to contamination of a ventilator that had been cleaned by thorough methods. Sophisticated deployment of whole-genome sequencing revealed the weaknesses in this medical who-done-it, informing improvements in hospital preventive measures. If applied rapidly, such analysis can even expose the causes of nosocomial infections in real time. The Gram-negative bacteria Klebsiella pneumoniae is a major cause of nosocomial infections, primarily among immunocompromised patients. The emergence of strains resistant to carbapenems has left few treatment options, making infection containment critical. In 2011, the U.S. National Institutes of Health Clinical Center experienced an outbreak of carbapenem-resistant K. pneumoniae that affected 18 patients, 11 of whom died. Whole-genome sequencing was performed on K. pneumoniae isolates to gain insight into why the outbreak progressed despite early implementation of infection control procedures. Integrated genomic and epidemiological analysis traced the outbreak to three independent transmissions from a single patient who was discharged 3 weeks before the next case became clinically apparent. Additional genomic comparisons provided evidence for unexpected transmission routes, with subsequent mining of epidemiological data pointing to possible explanations for these transmissions. Our analysis demonstrates that integration of genomic and epidemiological data can yield actionable insights and facilitate the control of nosocomial transmission.
Genome Biology | 2009
Bolan Linghu; Evan S. Snitkin; Zhenjun Hu; Yu Xia; Charles DeLisi
We integrate 16 genomic features to construct an evidence-weighted functional-linkage network comprising 21,657 human genes. The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimers disease. Many of these disease-disease associations are supported by epidemiology, but with no previous genetic basis. Such associations can drive novel hypotheses on molecular mechanisms of diseases and therapies.
BMC Genomics | 2006
Adam M. Gustafson; Evan S. Snitkin; Stephen C.J. Parker; Charles DeLisi; Simon Kasif
BackgroundThe identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for experimental studies aimed at constructing a catalog of essential genes in a given organism, a computational approach which could identify essential genes with high accuracy would be of great value.ResultsWe gathered numerous features which could be generated automatically from genome sequence data and assessed their relationship to essentiality, and subsequently utilized machine learning to construct an integrated classifier of essential genes in both S. cerevisiae and E. coli. When looking at single features, phyletic retention, a measure of the number of organisms an ortholog is present in, was the most predictive of essentiality. Furthermore, during construction of our phyletic retention feature we for the first time explored the evolutionary relationship among the set of organisms in which the presence of a gene is most predictive of essentiality. We found that in both E. coli and S. cerevisiae the optimal sets always contain host-associated organisms with small genomes which are closely related to the reference. Using five optimally selected organisms, we were able to improve predictive accuracy as compared to using all available sequenced organisms. We hypothesize the predictive power of these genomes is a consequence of the process of reductive evolution, by which many parasites and symbionts evolved their gene content. In addition, essentiality is measured in rich media, a condition which resembles the environments of these organisms in their hosts where many nutrients are provided. Finally, we demonstrate that integration of our most highly predictive features using a probabilistic classifier resulted in accuracies surpassing any individual feature.ConclusionUsing features obtainable directly from sequence data, we were able to construct a classifier which can predict essential genes with high accuracy. Furthermore, our analysis of the set of genomes in which the presence of a gene is most predictive of essentiality may suggest ways in which targeted sequencing can be used in the identification of essential genes. In summary, the methods presented here can aid in the reduction of time and money invested in essential gene identification by targeting those genes for experimentation which are predicted as being essential with a high probability.
Briefings in Bioinformatics | 2008
Zhenjun Hu; Evan S. Snitkin; Charles DeLisi
The essence of a living cell is adaptation to a changing environment, and a central goal of modern cell biology is to understand adaptive change under normal and pathological conditions. Because the number of components is large, and processes and conditions are many, visual tools are useful in providing an overview of relations that would otherwise be far more difficult to assimilate. Historically, representations were static pictures, with genes and proteins represented as nodes, and known or inferred correlations between them (links) represented by various kinds of lines. The modern challenge is to capture functional hierarchies and adaptation to environmental change, and to discover pathways and processes embedded in known data, but not currently recognizable. Among the tools being developed to meet this challenge is VisANT (freely available at http://visant.bu.edu) which integrates, mines and displays hierarchical information. Challenges to integrating modeling (discrete or continuous) and simulation capabilities into such visual mining software are briefly discussed.
Journal of Bacteriology | 2009
Varun Mazumdar; Evan S. Snitkin; Salomon Amar; Daniel Segrè
The microbial community present in the human mouth is engaged in a complex network of diverse metabolic activities. In addition to serving as energy and building-block sources, metabolites are key players in interspecies and host-pathogen interactions. Metabolites are also implicated in triggering the local inflammatory response, which can affect systemic conditions such as atherosclerosis, obesity, and diabetes. While the genome of several oral pathogens has been sequenced, quantitative understanding of the metabolic functions of any oral pathogen at the system level has not been explored yet. Here we pursue the computational construction and analysis of the genome-scale metabolic network of Porphyromonas gingivalis, a gram-negative anaerobe that is endemic in the human population and largely responsible for adult periodontitis. Integrating information from the genome, online databases, and literature screening, we built a stoichiometric model that encompasses 679 metabolic reactions. By using flux balance approaches and automated network visualization, we analyze the growth capacity under amino-acid-rich medium and provide evidence that amino acid preference and cytotoxic by-product secretion rates are suitably reproduced by the model. To provide further insight into the basic metabolic functions of P. gingivalis and suggest potential drug targets, we study systematically how the network responds to any reaction knockout. We focus specifically on the lipopolysaccharide biosynthesis pathway and identify eight putative targets, one of which has been recently verified experimentally. The current model, which is amenable to further experimental testing and refinements, could prove useful in evaluating the oral microbiome dynamics and in the development of novel biomedical applications.
Genome Biology | 2008
Evan S. Snitkin; Aimée M. Dudley; Daniel M. Janse; Kaisheen Wong; George M. Church; Daniel Segrè
BackgroundUnderstanding the response of complex biochemical networks to genetic perturbations and environmental variability is a fundamental challenge in biology. Integration of high-throughput experimental assays and genome-scale computational methods is likely to produce insight otherwise unreachable, but specific examples of such integration have only begun to be explored.ResultsIn this study, we measured growth phenotypes of 465 Saccharomyces cerevisiae gene deletion mutants under 16 metabolically relevant conditions and integrated them with the corresponding flux balance model predictions. We first used discordance between experimental results and model predictions to guide a stage of experimental refinement, which resulted in a significant improvement in the quality of the experimental data. Next, we used discordance still present in the refined experimental data to assess the reliability of yeast metabolism models under different conditions. In addition to estimating predictive capacity based on growth phenotypes, we sought to explain these discordances by examining predicted flux distributions visualized through a new, freely available platform. This analysis led to insight into the glycerol utilization pathway and the potential effects of metabolic shortcuts on model results. Finally, we used model predictions and experimental data to discriminate between alternative raffinose catabolism routes.ConclusionsOur study demonstrates how a new level of integration between high throughput measurements and flux balance model predictions can improve understanding of both experimental and computational results. The added value of a joint analysis is a more reliable platform for specific testing of biological hypotheses, such as the catabolic routes of different carbon sources.
Genome Research | 2013
Evan S. Snitkin; Adrian M. Zelazny; Jyoti Gupta; Tara N. Palmore; Patrick R. Murray; Julia A. Segre
Bacterial whole-genome sequencing (WGS) of human pathogens has provided unprecedented insights into the evolution of antibiotic resistance. Most studies have focused on identification of resistance mutations, leaving one to speculate on the fate of these mutants once the antibiotic selective pressure is removed. We performed WGS on longitudinal isolates of Acinetobacter baumannii from patients undergoing colistin treatment, and upon subsequent drug withdrawal. In each of the four patients, colistin resistance evolved via mutations at the pmr locus. Upon colistin withdrawal, an ancestral susceptible strain outcompeted resistant isolates in three of the four cases. In the final case, resistance was also lost, but by a compensatory inactivating mutation in the transcriptional regulator of the pmr locus. Notably, this inactivating mutation reduced the probability of reacquiring colistin resistance when subsequently challenged in vitro. On face value, these results supported an in vivo fitness cost preventing the evolution of stable colistin resistance. However, more careful analysis of WGS data identified genomic evidence for stable colistin resistance undetected by clinical microbiological assays. Transcriptional studies validated this genomic hypothesis, showing increased pmr expression of the initial isolate. Moreover, altering the environmental growth conditions of the clinical assay recapitulated the classification as colistin resistant. Additional targeted sequencing revealed that this isolate evolved undetected in a patient undergoing colistin treatment, and was then transmitted to other hospitalized patients, further demonstrating its stability in the absence of colistin. This study provides a unique window into mutational pathways taken in response to antibiotic pressure in vivo, and demonstrates the potential for genome sequence data to predict resistance phenotypes.
BMC Bioinformatics | 2006
Evan S. Snitkin; Adam M. Gustafson; Joseph C. Mellor; Jie Wu; Charles DeLisi
BackgroundThe rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes.ResultsIn order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results.ConclusionGiven the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes.
PLOS Genetics | 2011
Evan S. Snitkin; Daniel Segrè
An epistatic interaction between two genes occurs when the phenotypic impact of one gene depends on another gene, often exposing a functional association between them. Due to experimental scalability and to evolutionary significance, abundant work has been focused on studying how epistasis affects cellular growth rate, most notably in yeast. However, epistasis likely influences many different phenotypes, affecting our capacity to understand cellular functions, biochemical networks adaptation, and genetic diseases. Despite its broad significance, the extent and nature of epistasis relative to different phenotypes remain fundamentally unexplored. Here we use genome-scale metabolic network modeling to investigate the extent and properties of epistatic interactions relative to multiple phenotypes. Specifically, using an experimentally refined stoichiometric model for Saccharomyces cerevisiae, we computed a three-dimensional matrix of epistatic interactions between any two enzyme gene deletions, with respect to all metabolic flux phenotypes. We found that the total number of epistatic interactions between enzymes increases rapidly as phenotypes are added, plateauing at approximately 80 phenotypes, to an overall connectivity that is roughly 8-fold larger than the one observed relative to growth alone. Looking at interactions across all phenotypes, we found that gene pairs interact incoherently relative to different phenotypes, i.e. antagonistically relative to some phenotypes and synergistically relative to others. Specific deletion-deletion-phenotype triplets can be explained metabolically, suggesting a highly informative role of multi-phenotype epistasis in mapping cellular functions. Finally, we found that genes involved in many interactions across multiple phenotypes are more highly expressed, evolve slower, and tend to be associated with diseases, indicating that the importance of genes is hidden in their total phenotypic impact. Our predictions indicate a pervasiveness of nonlinear effects in how genetic perturbations affect multiple metabolic phenotypes. The approaches and results reported could influence future efforts in understanding metabolic diseases and the role of biochemical regulation in the cell.
BMC Bioinformatics | 2008
Bolan Linghu; Evan S. Snitkin; Dustin T. Holloway; Adam M. Gustafson; Yu Xia; Charles DeLisi
BackgroundInformation obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation.ResultsWe first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms.ConclusionWe provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.