Gaurav D. Moghe
Michigan State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gaurav D. Moghe.
Plant Physiology | 2014
Michael S. Campbell; MeiYee Law; Carson Holt; Joshua C. Stein; Gaurav D. Moghe; David E. Hufnagel; Jikai Lei; Rujira Achawanantakun; Dian Jiao; Carolyn J. Lawrence; Doreen Ware; Shin Han Shiu; Kevin L. Childs; Yanni Sun; Ning Jiang; Mark Yandell
MAKER-P annotates the entire Arabidopsis and maize genomes in less than 3 h with comparable quality to the current TAIR10 and maize V2 annotation builds. We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.
Plant Journal | 2012
Rebecca M. Davidson; Malali Gowda; Gaurav D. Moghe; Haining Lin; Brieanne Vaillancourt; Shin Han Shiu; Ning Jiang; C. Robin Buell
The Poaceae family, also known as the grasses, includes agronomically important cereal crops such as rice, maize, sorghum, and wheat. Previous comparative studies have shown that much of the gene content is shared among the grasses; however, functional conservation of orthologous genes has yet to be explored. To gain an understanding of the genome-wide patterns of evolution of gene expression across reproductive tissues, we employed a sequence-based approach to compare analogous transcriptomes in species representing three Poaceae subgroups including the Pooideae (Brachypodium distachyon), the Panicoideae (sorghum), and the Ehrhartoideae (rice). Our transcriptome analyses reveal that only a fraction of orthologous genes exhibit conserved expression patterns. A high proportion of conserved orthologs include genes that are upregulated in physiologically similar tissues such as leaves, anther, pistil, and embryo, while orthologs that are highly expressed in seeds show the most diverged expression patterns. More generally, we show that evolution of gene expression profiles and coding sequences in the grasses may be linked. Genes that are highly and broadly expressed tend to be conserved at the coding sequence level while genes with narrow expression patterns show accelerated rates of sequence evolution. We further show that orthologs in syntenic genomic blocks are more likely to share correlated expression patterns compared with non-syntenic orthologs. These findings are important for agricultural improvement because sequence information is transferred from model species, such as Brachypodium, rice, and sorghum to crop plants without sequenced genomes.
The Plant Cell | 2014
Gaurav D. Moghe; David E. Hufnagel; Haibao Tang; Yongli Xiao; Ian Dworkin; Christopher D. Town; Jeffrey K. Conner; Shin Han Shiu
This work compares the genomes of four Brassicaceae species to examine the patterns of gene gains and losses following whole-genome duplication, finding that retained genes showed substantial divergence in sequence, expression, function, and network connectivity. This information was used to establish a statistical learning model for predicting whether a duplicate would be retained postpolyploidization. Polyploidization events are frequent among flowering plants, and the duplicate genes produced via such events contribute significantly to plant evolution. We sequenced the genome of wild radish (Raphanus raphanistrum), a Brassicaceae species that experienced a whole-genome triplication event prior to diverging from Brassica rapa. Despite substantial gene gains in these two species compared with Arabidopsis thaliana and Arabidopsis lyrata, ∼70% of the orthologous groups experienced gene losses in R. raphanistrum and B. rapa, with most of the losses occurring prior to their divergence. The retained duplicates show substantial divergence in sequence and expression. Based on comparison of A. thaliana and R. raphanistrum ortholog floral expression levels, retained radish duplicates diverged primarily via maintenance of ancestral expression level in one copy and reduction of expression level in others. In addition, retained duplicates differed significantly from genes that reverted to singleton state in function, sequence composition, expression patterns, network connectivity, and rates of evolution. Using these properties, we established a statistical learning model for predicting whether a duplicate would be retained postpolyploidization. Overall, our study provides new insights into the processes of plant duplicate loss, retention, and functional divergence and highlights the need for further understanding factors controlling duplicate gene fate.
BMC Evolutionary Biology | 2010
Haining Lin; Gaurav D. Moghe; Shu-Yuan Ouyang; Amy F. Iezzoni; Shin Han Shiu; Xun Gu; C. Robin Buell
BackgroundThe availability of genome and transcriptome sequences for a number of species permits the identification and characterization of conserved as well as divergent genes such as lineage-specific genes which have no detectable sequence similarity to genes from other lineages. While genes conserved among taxa provide insight into the core processes among species, lineage-specific genes provide insights into evolutionary processes and biological functions that are likely clade or species specific.ResultsComparative analyses using the Arabidopsis thaliana genome and sequences from 178 other species within the Plant Kingdom enabled the identification of 24,624 A. thaliana genes (91.7%) that were termed Evolutionary Conserved (EC) as defined by sequence similarity to a database entry as well as two sets of lineage-specific genes within A. thaliana. One of the A. thaliana lineage-specific gene sets share sequence similarity only to sequences from species within the Brassicaceae family and are termed Conserved Brassicaceae-Specific Genes (914, 3.4%, CBSG). The other set of A. thaliana lineage-specific genes, the Arabidopsis Lineage-Specific Genes (1,324, 4.9%, ALSG), lack sequence similarity to any sequence outside A. thaliana. While many CBSGs (76.7%) and ALSGs (52.9%) are transcribed, the majority of the CBSGs (76.1%) and ALSGs (94.4%) have no annotated function. Co-expression analysis indicated significant enrichment of the CBSGs and ALSGs in multiple functional categories suggesting their involvement in a wide range of biological functions. Subcellular localization prediction revealed that the CBSGs were significantly enriched in proteins targeted to the secretory pathway (412, 45.1%). Among the 107 putatively secreted CBSGs with known functions, 67 encode a putative pollen coat protein or cysteine-rich protein with sequence similarity to the S-locus cysteine-rich protein that is the pollen determinant controlling allele specific pollen rejection in self-incompatible Brassicaceae species. Overall, the ALSGs and CBSGs were more highly methylated in floral tissue compared to the ECs. Single Nucleotide Polymorphism (SNP) analysis showed an elevated ratio of non-synonymous to synonymous SNPs within the ALSGs (1.99) and CBSGs (1.65) relative to the EC set (0.92), mainly caused by an elevated number of non-synonymous SNPs, indicating that they are fast-evolving at the protein sequence level.ConclusionsOur analyses suggest that while a significant fraction of the A. thaliana proteome is conserved within the Plant Kingdom, evolutionarily distinct sets of genes that may function in defining biological processes unique to these lineages have arisen within the Brassicaceae and A. thaliana.
The Plant Cell | 2015
Anthony L. Schilmiller; Gaurav D. Moghe; Pengxiang Fan; Banibrata Ghosh; Jing Ning; A. Daniel Jones
ASAT3 is a trichome-specific BAHD-type acyltransferase that adds aliphatic acyl chains to acylsucroses and contributes to phenotypic diversity of acylsugars among the Solanum tomato species. Glandular trichomes from tomato (Solanum lycopersicum) and other species in the Solanaceae produce and secrete a mixture of O-acylsugars (aliphatic esters of sucrose and glucose) that contribute to insect defense. Despite their phylogenetic distribution and diversity, relatively little is known about how these specialized metabolites are synthesized. Mass spectrometric profiling of acylsugars in the S. lycopersicum x Solanum pennellii introgression lines identified a chromosome 11 locus containing a cluster of BAHD acyltransferases with one gene (named Sl-ASAT3) expressed in tip cells of type I trichomes where acylsugars are made. Sl-ASAT3 was shown to encode an acyl-CoA-dependent acyltransferase that catalyzes the transfer of short (four to five carbons) branched acyl chains to the furanose ring of di-acylsucrose acceptors to produce tri-acylsucroses, which can be further acetylated by Sl-ASAT4 (previously Sl-AT2). Among the wild tomatoes, diversity in furanose ring acyl chains on acylsucroses was most striking in Solanum habrochaites. S. habrochaites accessions from Ecuador and northern Peru produced acylsucroses with short (≤C5) or no acyl chains on the furanose ring. Accessions from central and southern Peru had the ability to add short or long (up to C12) acyl chains to the furanose ring. Multiple ASAT3-like sequences were found in most accessions, and their in vitro activities correlated with observed geographical diversity in acylsugar profiles.
Annals of the New York Academy of Sciences | 2014
Gaurav D. Moghe; Shin Han Shiu
Polyploidy is an important force shaping plant genomes. All flowering plants are descendants of an ancestral polyploid species, and up to 70% of extant vascular plant species are believed to be recent polyploids. Over the past century, a significant body of knowledge has accumulated regarding the prevalence and ecology of polyploid plants. In this review, we summarize our current understanding of the causes and molecular consequences of polyploidization in angiosperms. We also provide a discussion on the relationships between polyploidy and adaptation and suggest areas where further research may provide a better understanding of polyploidy.
Plant Physiology | 2015
Gaurav D. Moghe
Specialized metabolic pathways have evolved by recruitment of duplicated genes encoding enzymes of core metabolism and divergence of the encoded activity, by alteration of biochemical regulation, and by changes in gene expression patterns. Plants produce hundreds of thousands of small molecules known as specialized metabolites, many of which are of economic and ecological importance. This remarkable variety is a consequence of the diversity and rapid evolution of specialized metabolic pathways. These novel biosynthetic pathways originate via gene duplication or by functional divergence of existing genes, and they subsequently evolve through selection and/or drift. Studies over the past two decades revealed that diverse specialized metabolic pathways have resulted from the incorporation of primary metabolic enzymes. We discuss examples of enzyme recruitment from primary metabolism and the variety of paths taken by duplicated primary metabolic enzymes toward integration into specialized metabolism. These examples provide insight into processes by which plant specialized metabolic pathways evolve and suggest approaches to discover enzymes of previously uncharacterized metabolic networks.
Plant Physiology | 2015
Jing Ning; Gaurav D. Moghe; Bryan Leong; Jeongwoon Kim; Itai Ofner; Zhenzhen Wang; Christopher Adams; A. Daniel Jones; Dani Zamir
Isopropylmalate synthase3 is a variant Leu biosynthetic enzyme and its diversity affects acylsucrose composition. Acylsugars are insecticidal specialized metabolites produced in the glandular trichomes of plants in the Solanaceae family. In the tomato clade of the Solanum genus, acylsugars consist of aliphatic acids of different chain lengths esterified to sucrose, or less frequently to glucose. Through liquid chromatography-mass spectrometry screening of introgression lines, we previously identified a region of chromosome 8 in the Solanum pennellii LA0716 genome (IL8-1/8-1-1) that causes the cultivated tomato Solanum lycopersicum to shift from producing acylsucroses with abundant 3-methylbutanoic acid acyl chains derived from leucine metabolism to 2-methylpropanoic acid acyl chains derived from valine metabolism. We describe multiple lines of evidence implicating a trichome-expressed gene from this region as playing a role in this shift. S. lycopersicum M82 SlIPMS3 (Solyc08g014230) encodes a functional end product inhibition-insensitive version of the committing enzyme of leucine biosynthesis, isopropylmalate synthase, missing the carboxyl-terminal 160 amino acids. In contrast, the S. pennellii LA0716 IPMS3 allele found in IL8-1/8-1-1 encodes a nonfunctional truncated IPMS protein. M82 transformed with an SlIPMS3 RNA interference construct exhibited an acylsugar profile similar to that of IL8-1-1, whereas the expression of SlIPMS3 in IL8-1-1 partially restored the M82 acylsugar phenotype. These IPMS3 alleles are polymorphic in 14 S. pennellii accessions spread throughout the geographical range of occurrence for this species and are associated with acylsugars containing varying amounts of 2-methylpropanoic acid and 3-methylbutanoic acid acyl chains.
The Plant Cell | 2015
John P Lloyd; Alexander E. Seddon; Gaurav D. Moghe; Matthew C. Simenc; Shin Han Shiu
Essential genes in Arabidopsis thaliana display distinct characteristics that are used to build machine learning models capable of predicting lethal-phenotype genes within and between species. Essential genes represent critical cellular components whose disruption results in lethality. Characteristics shared among essential genes have been uncovered in fungal and metazoan model systems. However, features associated with plant essential genes are largely unknown and the full set of essential genes remains to be discovered in any plant species. Here, we show that essential genes in Arabidopsis thaliana have distinct features useful for constructing within- and cross-species prediction models. Essential genes in A. thaliana are often single copy or derived from older duplications, highly and broadly expressed, slow evolving, and highly connected within molecular networks compared with genes with nonlethal mutant phenotypes. These gene features allowed the application of machine learning methods that predicted known lethal genes as well as an additional 1970 likely essential genes without documented phenotypes. Prediction models from A. thaliana could also be applied to predict Oryza sativa and Saccharomyces cerevisiae essential genes. Importantly, successful predictions drew upon many features, while any single feature was not sufficient. Our findings show that essential genes can be distinguished from genes with nonlethal phenotypes using features that are similar across kingdoms and indicate the possibility for translational application of our approach to species without extensive functional genomic and phenomic resources.
Plant Physiology | 2013
Gaurav D. Moghe; Melissa D. Lehti-Shiu; Alex E. Seddon; Shan Yin; Yani Chen; Piyada Juntawong; Federica Brandizzi; Julia Bailey-Serres; Shin Han Shiu
The Arabidopsis (Arabidopsis thaliana) genome is the most well-annotated plant genome. However, transcriptome sequencing in Arabidopsis continues to suggest the presence of polyadenylated (polyA) transcripts originating from presumed intergenic regions. It is not clear whether these transcripts represent novel noncoding or protein-coding genes. To understand the nature of intergenic polyA transcription, we first assessed its abundance using multiple messenger RNA sequencing data sets. We found 6,545 intergenic transcribed fragments (ITFs) occupying 3.6% of Arabidopsis intergenic space. In contrast to transcribed fragments that map to protein-coding and RNA genes, most ITFs are significantly shorter, are expressed at significantly lower levels, and tend to be more data set specific. A surprisingly large number of ITFs (32.1%) may be protein coding based on evidence of translation. However, our results indicate that these “translated” ITFs tend to be close to and are likely associated with known genes. To investigate if ITFs are under selection and are functional, we assessed ITF conservation through cross-species as well as within-species comparisons. Our analysis reveals that 237 ITFs, including 49 with translation evidence, are under strong selective constraint and relatively distant from annotated features. These ITFs are likely parts of novel genes. However, the selective pressure imposed on most ITFs is similar to that of randomly selected, untranscribed intergenic sequences. Our findings indicate that despite the prevalence of ITFs, apart from the possibility of genomic contamination, many may be background or noisy transcripts derived from “junk” DNA, whose production may be inherent to the process of transcription and which, on rare occasions, may act as catalysts for the creation of novel genes.