Vineet K. Sharma
Indian Institute of Science Education and Research, Bhopal
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vineet K. Sharma.
Proceedings of the National Academy of Sciences of the United States of America | 2008
Yuichi Hongoh; Vineet K. Sharma; Tulika Prakash; Satoko Noda; Todd D. Taylor; Toshiaki Kudo; Yoshiyuki Sakaki; Atsushi Toyoda; Masahira Hattori; Moriya Ohkuma
Termites harbor a symbiotic gut microbial community that is responsible for their ability to thrive on recalcitrant plant matter. The community comprises diverse microorganisms, most of which are as yet uncultivable; the detailed symbiotic mechanism remains unclear. Here, we present the first complete genome sequence of a termite gut symbiont—an uncultured bacterium named Rs-D17 belonging to the candidate phylum Termite Group 1 (TG1). TG1 is a dominant group in termite guts, found as intracellular symbionts of various cellulolytic protists, without any physiological information. To acquire the complete genome sequence, we collected Rs-D17 cells from only a single host protist cell to minimize their genomic variation and performed isothermal whole-genome amplification. This strategy enabled us to reconstruct a circular chromosome (1,125,857 bp) encoding 761 putative protein-coding genes. The genome additionally contains 121 pseudogenes assigned to categories, such as cell wall biosynthesis, regulators, transporters, and defense mechanisms. Despite its apparent reductive evolution, the ability to synthesize 15 amino acids and various cofactors is retained, some of these genes having been duplicated. Considering that diverse termite-gut protists harbor TG1 bacteria, we suggest that this bacterial group plays a key role in the gut symbiotic system by stably supplying essential nitrogenous compounds deficient in lignocelluloses to their host protists and the termites. Our results provide a breakthrough to clarify the functions of and the interactions among the individual members of this multilayered symbiotic complex.
Science | 2008
Yuichi Hongoh; Vineet K. Sharma; Tulika Prakash; Satoko Noda; Hidehiro Toh; Todd D. Taylor; Toshiaki Kudo; Yoshiyuki Sakaki; Atsushi Toyoda; Masahira Hattori; Moriya Ohkuma
Termites harbor diverse symbiotic gut microorganisms, the majority of which are as yet uncultivable and their interrelationships unclear. Here, we present the complete genome sequence of the uncultured Bacteroidales endosymbiont of the cellulolytic protist Pseudotrichonympha grassii, which accounts for 70% of the bacterial cells in the gut of the termite Coptotermes formosanus. Functional annotation of the chromosome (1,114,206 base pairs) unveiled its ability to fix dinitrogen and recycle putative host nitrogen wastes for biosynthesis of diverse amino acids and cofactors, and import glucose and xylose as energy and carbon sources. Thus, nitrogen fixation and cellulolysis are coupled within the protists cells. This highly evolved symbiotic system probably underlies the ability of the worldwide pest termites Coptotermes to use wood as their sole food.
PLOS ONE | 2010
Tulika Prakash; Vineet K. Sharma; Naoki Adati; Ritsuko Ozawa; Naveen Kumar; Yuichiro Nishida; Takayoshi Fujikake; Tadayuki Takeda; Todd D. Taylor
From the ENCODE project, it is realized that almost every base of the entire human genome is transcribed. One class of transcripts resulting from this arises from the conjoined gene, which is formed by combining the exons of two or more distinct (parent) genes lying on the same strand of a chromosome. Only a very limited number of such genes are known, and the definition and terminologies used for them are highly variable in the public databases. In this work, we have computationally identified and manually curated 751 conjoined genes (CGs) in the human genome that are supported by at least one mRNA or EST sequence available in the NCBI database. 353 representative CGs, of which 291 (82%) could be confirmed, were subjected to experimental validation using RT-PCR and sequencing methods. We speculate that these genes are arising out of novel functional requirements and are not merely artifacts of transcription, since more than 70% of them are conserved in other vertebrate genomes. The unique splicing patterns exhibited by CGs reveal their possible roles in protein evolution or gene regulation. Novel CGs, for which no transcript is available, could be identified in 80% of randomly selected potential CG forming regions, indicating that their formation is a routine process. Formation of CGs is not only limited to human, as we have also identified 270 CGs in mouse and 227 in drosophila using our approach. Additionally, we propose a novel mechanism for the formation of CGs. Finally, we developed a database, ConjoinG, which contains detailed information about all the CGs (800 in total) identified in the human genome. In summary, our findings reveal new insights about the functionality of CGs in terms of another possible mechanism for gene regulation and genomic evolution and the mechanism leading to their formation.
Nucleic Acids Research | 2010
Vineet K. Sharma; Naveen Kumar; Tulika Prakash; Todd D. Taylor
Microbial enzymes have many known applications as biocatalysts in biotechnology, agriculture, medical and other industries. However, only a few enzymes are currently employed for such commercial applications. In this scenario, the current onslaught of metagenomic data provides a new unexplored treasure trove of genomic wealth that can not only enhance the enzyme repertoire by the discovery of novel commercially useful enzymes (CUEs) but can also reveal better functional variants for existing CUEs. We prepared a catalogue of CUEs using text mining of PubMed abstracts and other publicly available information, and manually curated the data to identify 510 CUEs. Further, in order to identify novel homologues of these CUEs, we identified potential ORFs in publicly available metagenomic datasets from 10 diverse sources. Using this strategy, we have developed a resource called MetaBioME (http://metasystems.riken.jp/metabiome/) that comprises (i) a database of CUEs and (ii) a comprehensive platform to facilitate homology-based computational identification of novel homologous CUEs from metagenomic and bacterial genomic datasets. Using MetaBioME, we have identified several novel homologues to known CUEs that can potentially serve as leads for further experimental verification.
PLOS ONE | 2014
Ankit Gupta; Rohan Kapil; Darshan B. Dhakan; Vineet K. Sharma
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.
Journal of Bacteriology | 2011
Hidehiro Toh; Vineet K. Sharma; Kenshiro Oshima; Shinji Kondo; Masahira Hattori; F. Bruce Ward; Andrew Free; Todd D. Taylor
Arcobacter butzleri strain ED-1 is an exoelectrogenic epsilonproteobacterium isolated from the anode biofilm of a microbial fuel cell. Arcobacter sp. strain L dominates the liquid phase of the same fuel cell. Here we report the finished and annotated genome sequences of these organisms.
PLOS ONE | 2015
Nikhil Chaudhary; Ashok K. Sharma; Piyush Agarwal; Ankit Gupta; Vineet K. Sharma
The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier.
PLOS ONE | 2012
Vineet K. Sharma; Naveen Kumar; Tulika Prakash; Todd D. Taylor
Taxonomic assignment of sequence reads is a challenging task in metagenomic data analysis, for which the present methods mainly use either composition- or homology-based approaches. Though the homology-based methods are more sensitive and accurate, they suffer primarily due to the time needed to generate the Blast alignments. We developed the MetaBin program and web server for better homology-based taxonomic assignments using an ORF-based approach. By implementing Blat as the faster alignment method in place of Blastx, the analysis time has been reduced by severalfold. It is benchmarked using both simulated and real metagenomic datasets, and can be used for both single and paired-end sequence reads of varying lengths (≥45 bp). To our knowledge, MetaBin is the only available program that can be used for the taxonomic binning of short reads (<100 bp) with high accuracy and high sensitivity using a homology-based approach. The MetaBin web server can be used to carry out the taxonomic analysis, by either submitting reads or Blastx output. It provides several options including construction of taxonomic trees, creation of a composition chart, functional analysis using COGs, and comparative analysis of multiple metagenomic datasets. MetaBin web server and a standalone version for high-throughput analysis are available freely at http://metabin.riken.jp/.
BMC Genomics | 2005
Vineet K. Sharma; Samir K. Brahmachari
BackgroundCreation of human gene families was facilitated significantly by gene duplication and diversification. The (TG/CA)n repeats exhibit length variability, display genome-wide distribution, and are abundant in the human genome. Accumulation of evidences for their multiple functional roles including regulation of transcription and stimulation of recombination and splicing elect them as functional elements. Here, we report analysis of the distribution of (TG/CA)n repeats in human gene families.ResultsThe 1,317 human gene families were classified into six functional classes. Distribution of (TG/CA)n repeats were analyzed both from a global perspective and from a stratified perspective based on their biological properties. The number of genes with repeats decreased with increasing repeat length and several genes (53%) had repeats of multiple types in various combinations. Repeats were positively associated with the class of Signaling and communication whereas, they were negatively associated with the classes of Immune and related functions and of Information. The proportion of genes with (TG/CA)n repeats in each class was proportional to the corresponding average gene length. The repeat distribution pattern in large gene families generally mirrored the global distribution pattern but differed particularly for Collagen gene family, which was rich in repeats. The position and flanking sequences of the repeats of Collagen genes showed high conservation in the Chimpanzee genome. However the majority of these repeats displayed length polymorphism.ConclusionPositive association of repeats with genes of Signaling and communication points to their role in modulation of transcription. Negative association of repeats in genes of Information relates to the smaller gene length, higher expression and fundamental role in cellular physiology. In genes of Immune and related functions negative association of repeats perhaps relates to the smaller gene length and the directional nature of the recombinogenic processes to generate immune diversity. Thus, multiple factors including gene length, function and directionality of recombinogenic processes steered the observed distribution of (TG/CA)n repeats. Furthermore, the distribution of repeat patterns is consistent with the current model that long repeats tend to contract more than expand whereas, the reverse dynamics operates in short repeats.
Frontiers in Microbiology | 2017
Rituja Saxena; Darshan B. Dhakan; Parul Mittal; Prashant Waiker; Anirban Chowdhury; Arundhuti Ghatak; Vineet K. Sharma
Extreme ecosystems such as hot springs are of great interest as a source of novel extremophilic species, enzymes, metabolic functions for survival and biotechnological products. India harbors hundreds of hot springs, the majority of which are not yet explored and require comprehensive studies to unravel their unknown and untapped phylogenetic and functional diversity. The aim of this study was to perform a large-scale metagenomic analysis of three major hot springs located in central India namely, Badi Anhoni, Chhoti Anhoni, and Tattapani at two geographically distinct regions (Anhoni and Tattapani), to uncover the resident microbial community and their metabolic traits. Samples were collected from seven distinct sites of the three hot spring locations with temperature ranging from 43.5 to 98°C. The 16S rRNA gene amplicon sequencing of V3 hypervariable region and shotgun metagenome sequencing uncovered a unique taxonomic and metabolic diversity of the resident thermophilic microbial community in these hot springs. Genes associated with hydrocarbon degradation pathways, such as benzoate, xylene, toluene, and benzene were observed to be abundant in the Anhoni hot springs (43.5–55°C), dominated by Pseudomonas stutzeri and Acidovorax sp., suggesting the presence of chemoorganotrophic thermophilic community with the ability to utilize complex hydrocarbons as a source of energy. A high abundance of genes belonging to methane metabolism pathway was observed at Chhoti Anhoni hot spring, where methane is reported to constitute >80% of all the emitted gases, which was marked by the high abundance of Methylococcus capsulatus. The Tattapani hot spring, with a high-temperature range (61.5–98°C), displayed a lower microbial diversity and was primarily dominated by a nitrate-reducing archaeal species Pyrobaculum aerophilum. A higher abundance of cell metabolism pathways essential for the microbial survival in extreme conditions was observed at Tattapani. Taken together, the results of this study reveal a novel consortium of microbes, genes, and pathways associated with the hot spring environment.