Neha Varghese
Joint Genome Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Neha Varghese.
Nucleic Acids Research | 2015
Neha Varghese; Supratim Mukherjee; Natalia Ivanova; Konstantinos T. Konstantinidis; Kostas Mavrommatis; Nikos C. Kyrpides; Amrita Pati
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required.
Nucleic Acids Research | 2017
I-Min A. Chen; Victor Markowitz; Ken Chu; Krishna Palaniappan; Ernest Szeto; Manoj Pillay; Anna Ratner; Jinghua Huang; Evan Andersen; Marcel Huntemann; Neha Varghese; Michalis Hadjithomas; Kristin Tennessen; Torben Nielsen; Natalia Ivanova; Nikos C. Kyrpides
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOEs Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGIs genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/Ms datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/Ms dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.
Science | 2017
Sergey Ovchinnikov; Hahnbeom Park; Neha Varghese; Po-Ssu Huang; Georgios A. Pavlopoulos; David E. Kim; Hetunandan Kamisetty; Nikos C. Kyrpides; David Baker
Filling in the protein fold picture Fewer than a third of the 14,849 known protein families have at least one member with an experimentally determined structure. This leaves more than 5000 protein families with no structural information. Protein modeling using residue-residue contacts inferred from evolutionary data has been successful in modeling unknown structures, but it requires large numbers of aligned sequences. Ovchinnikov et al. augmented such sequence alignments with metagenome sequence data (see the Perspective by Söding). They determined the number of sequences required to allow modeling, developed criteria for model quality, and, where possible, improved modeling by matching predicted contacts to known structures. Their method predicted quality structural models for 614 protein families, of which about 140 represent newly discovered protein folds. Science, this issue p. 294; see also p. 248 Combining metagenome data with protein structure prediction generates models for 614 families with unknown structures. Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.
Frontiers in Microbiology | 2017
C.W. Beukes; Marike Palmer; Puseletso Manyaka; Wai Y. Chan; Juanita R. Avontuur; Elritha Van Zyl; Marcel Huntemann; Alicia Clum; Manoj Pillay; Krishnaveni Palaniappan; Neha Varghese; Natalia Mikhailova; Dimitrios Stamatis; T. B. K. Reddy; Chris Daum; Nicole Shapiro; Victor Markowitz; Natalia Ivanova; Nikos C. Kyrpides; Tanja Woyke; Jochen Blom; William B. Whitman; Stephanus N. Venter; Emma Theodora Steenkamp
Although the taxonomy of Burkholderia has been extensively scrutinized, significant uncertainty remains regarding the generic boundaries and composition of this large and heterogeneous taxon. Here we used the amino acid and nucleotide sequences of 106 conserved proteins from 92 species to infer robust maximum likelihood phylogenies with which to investigate the generic structure of Burkholderia sensu lato. These data unambiguously supported five distinct lineages, of which four correspond to Burkholderia sensu stricto and the newly introduced genera Paraburkholderia, Caballeronia, and Robbsia. The fifth lineage was represented by P. rhizoxinica. Based on these findings, we propose 13 new combinations for those species previously described as members of Burkholderia but that form part of Caballeronia. These findings also suggest revision of the taxonomic status of P. rhizoxinica as it is does not form part of any of the genera currently recognized in Burkholderia sensu lato. From a phylogenetic point of view, Burkholderia sensu stricto has a sister relationship with the Caballeronia+Paraburkholderia clade. Also, the lineages represented by P. rhizoxinica and R. andropogonis, respectively, emerged prior to the radiation of the Burkholderia sensu stricto+Caballeronia+Paraburkholderia clade. Our findings therefore constitute a solid framework, not only for supporting current and future taxonomic decisions, but also for studying the evolution of this assemblage of medically, industrially and agriculturally important species.
Journal of Bacteriology | 2011
I. King Jordan; Andrew B. Conley; Ivan Antonov; Robert A. Arthur; Erin D. Cook; Guy P. Cooper; Bernard L. Jones; Kristen M. Knipe; Kevin J. Lee; Xing Liu; Gabriel J. Mitchell; Pushkar R. Pande; Robert A. Petit; Shaopu Qin; Vani N. Rajan; Shruti Sarda; Aswathy Sebastian; Shiyuyun Tang; Racchit Thapliyal; Neha Varghese; Tianjun Ye; Lee S. Katz; Xin Wang; Lori A. Rowe; Michael Frace; Leonard W. Mayer
We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors.
PLOS ONE | 2011
Daniel H. Haft; Neha Varghese
The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domains tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies.
Nature Biotechnology | 2018
Rekha Seshadri; Sinead C. Leahy; Graeme T. Attwood; Koon Hoong Teh; Suzanne C. Lambie; Adrian L. Cookson; Emiley A. Eloe-Fadrosh; Georgios A. Pavlopoulos; Michalis Hadjithomas; Neha Varghese; David Paez-Espino; Nikola Palevich; Peter H. Janssen; Ron S. Ronimus; Samantha Noel; Priya Soni; Kerri Reilly; Todd Atherly; Cherie J. Ziemer; André-Denis G. Wright; Suzanne Ishaq; Michael A. Cotta; Stephanie Thompson; Katie Crosley; Nest McKain; R. John Wallace; Harry J. Flint; Jennifer C. Martin; Robert J Forster; Robert J Gruninger
Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B12, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents ∼75% of the genus-level bacterial and archaeal taxa present in the rumen.
bioRxiv | 2017
David A. Baltrus; Kevin Dougherty; Kayla R. Arendt; Marcel Huntemann; Alicia Clum; Manoj Pillay; Krishnaveni Palaniappan; Neha Varghese; Natalia Mikhailova; Dimitrios Stamatis; T. B. K. Reddy; Chew Yee Ngan; Chris Daum; Nicole Shapiro; Victor Markowitz; Natalia Ivanova; Nikos C. Kyrpides; Tanja Woyke; A. Elizabeth Arnold
Fungi interact closely with bacteria, both on the surfaces of the hyphae and within their living tissues (i.e. endohyphal bacteria, EHB). These EHB can be obligate or facultative symbionts and can mediate diverse phenotypic traits in their hosts. Although EHB have been observed in many lineages of fungi, it remains unclear how widespread and general these associations are, and whether there are unifying ecological and genomic features can be found across EHB strains as a whole. We cultured 11 bacterial strains after they emerged from the hyphae of diverse Ascomycota that were isolated as foliar endophytes of cupressaceous trees, and generated nearly complete genome sequences for all. Unlike the genomes of largely obligate EHB, the genomes of these facultative EHB resembled those of closely related strains isolated from environmental sources. Although all analysed genomes encoded structures that could be used to interact with eukaryotic hosts, pathways previously implicated in maintenance and establishment of EHB symbiosis were not universally present across all strains. Independent isolation of two nearly identical pairs of strains from different classes of fungi, coupled with recent experimental evidence, suggests horizontal transfer of EHB across endophytic hosts. Given the potential for EHB to influence fungal phenotypes, these genomes could shed light on the mechanisms of plant growth promotion or stress mitigation by fungal endophytes during the symbiotic phase, as well as degradation of plant material during the saprotrophic phase. As such, these findings contribute to the illumination of a new dimension of functional biodiversity in fungi.
Standards in Genomic Sciences | 2016
Rich Boden; Lee P. Hutt; Marcel Huntemann; Alicia Clum; Manoj Pillay; Krishnaveni Palaniappan; Neha Varghese; Natalia Mikhailova; Dimitrios Stamatis; Tatiparthi Reddy; Chew Yee Ngan; Chris Daum; Nicole Shapiro; Victor Markowitz; Natalia Ivanova; Tanja Woyke; Nikos C. Kyrpides
Thermithiobacillus tepidarius DSM 3134T was originally isolated (1983) from the waters of a sulfidic spring entering the Roman Baths (Temple of Sulis-Minerva) at Bath, United Kingdom and is an obligate chemolithoautotroph growing at the expense of reduced sulfur species. This strain has a genome size of 2,958,498 bp. Here we report the genome sequence, annotation and characteristics. The genome comprises 2,902 protein coding and 66 RNA coding genes. Genes responsible for the transaldolase variant of the Calvin-Benson-Bassham cycle were identified along with a biosynthetic horseshoe in lieu of Krebs’ cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome c oxidase (cbb3, EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). Metalloresistance genes involved in pathways of arsenic and cadmium resistance were found. Evidence of horizontal gene transfer accounting for 5.9 % of the protein-coding genes was found, including transfer from Thiobacillus spp. and Methylococcus capsulatus Bath, isolated from the same spring. A sox gene cluster was found, similar in structure to those from other Acidithiobacillia – by comparison with Thiobacillus thioparus and Paracoccus denitrificans, an additional gene between soxA and soxB was found, annotated as a DUF302-family protein of unknown function. As the Kelly-Friedrich pathway of thiosulfate oxidation (encoded by sox) is not used in Thermithiobacillus spp., the role of the operon (if any) in this species remains unknown. We speculate that DUF302 and sox genes may have a role in periplasmic trithionate oxidation.
PLOS ONE | 2012
Lavanya Rishishwar; Neha Varghese; Eishita Tyagi; Stephen C. Harvey; I. King Jordan; Nael A. McCarty
Cystic fibrosis (CF) is the most common genetic disease among Caucasians, and accordingly the cystic fibrosis transmembrane conductance regulator (CFTR) protein has perhaps the best characterized disease mutation spectrum with more than 1,500 causative mutations having been identified. In this study, we took advantage of that wealth of mutational information in an effort to relate site-specific evolutionary parameters with the propensity and severity of CFTR disease-causing mutations. To do this, we devised a scoring scheme for known CFTR disease-causing mutations based on the Grantham amino acid chemical difference matrix. CFTR site-specific evolutionary constraint values were then computed for seven different evolutionary metrics across a range of increasing evolutionary depths. The CFTR mutational scores and the various site-specific evolutionary constraint values were compared in order to evaluate which evolutionary measures best reflect the disease-causing mutation spectrum. Site-specific evolutionary constraint values from the widely used comparative method PolyPhen2 show the best correlation with the CFTR mutation score spectrum, whereas more straightforward conservation based measures (ConSurf and ScoreCons) show the greatest ability to predict individual CFTR disease-causing mutations. While far greater than could be expected by chance alone, the fraction of the variability in mutation scores explained by the PolyPhen2 metric (3.6%), along with the best set of paired sensitivity (58%) and specificity (60%) values for the prediction of disease-causing residues, were marginal. These data indicate that evolutionary constraint levels are informative but far from determinant with respect to disease-causing mutations in CFTR. Nevertheless, this work shows that, when combined with additional lines of evidence, information on site-specific evolutionary conservation can and should be used to guide site-directed mutagenesis experiments by more narrowly defining the set of target residues, resulting in a potential savings of both time and money.