Yanming Di | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yanming Di is active.

Explore More

Publication

Featured researches published by Yanming Di.

PLOS Genetics | 2013

The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

Kathryn E. Bushley; Rajani Raja; Pankaj Jaiswal; Jason S. Cumbie; Mariko Nonogaki; Alexander E. Boyd; C. Alisha Owensby; Brian J. Knaus; Justin Elser; Daniel Miller; Yanming Di; Kerry L. McPhail; Joseph W. Spatafora

The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role of secondary metabolite gene clusters and their metabolites in fungal biology.

PLOS ONE | 2011

GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

Jason S. Cumbie; Jeffrey A. Kimbrel; Yanming Di; Daniel W. Schafer; Larry J. Wilhelm; Samuel E. Fox; Christopher M. Sullivan; Aron D. Curzon; James C. Carrington; Todd C. Mockler; Jeff H. Chang

GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts.

BMC Plant Biology | 2013

Methylome reorganization during in vitro dedifferentiation and regeneration of Populus trichocarpa

Kelly J. Vining; Kyle R. Pomraning; Larry J. Wilhelm; Cathleen Ma; Matteo Pellegrini; Yanming Di; Todd C. Mockler; Michael Freitag; Steven H. Strauss

BackgroundCytosine DNA methylation (5mC) is an epigenetic modification that is important to genome stability and regulation of gene expression. Perturbations of 5mC have been implicated as a cause of phenotypic variation among plants regenerated through in vitro culture systems. However, the pattern of change in 5mC and its functional role with respect to gene expression, are poorly understood at the genome scale. A fuller understanding of how 5mC changes during in vitro manipulation may aid the development of methods for reducing or amplifying the mutagenic and epigenetic effects of in vitro culture and plant transformation.ResultsWe investigated the in vitro methylome of the model tree species Populus trichocarpa in a system that mimics routine methods for regeneration and plant transformation in the genus Populus (poplar). Using methylated DNA immunoprecipitation followed by high-throughput sequencing (MeDIP-seq), we compared the methylomes of internode stem segments from micropropagated explants, dedifferentiated calli, and internodes from regenerated plants. We found that more than half (56%) of the methylated portion of the genome appeared to be differentially methylated among the three tissue types. Surprisingly, gene promoter methylation varied little among tissues, however, the percentage of body-methylated genes increased from 9% to 14% between explants and callus tissue, then decreased to 8% in regenerated internodes. Forty-five percent of differentially-methylated genes underwent transient methylation, becoming methylated in calli, and demethylated in regenerants. These genes were more frequent in chromosomal regions with higher gene density. Comparisons with an expression microarray dataset showed that genes methylated at both promoters and gene bodies had lower expression than genes that were unmethylated or only promoter-methylated in all three tissues. Four types of abundant transposable elements showed their highest levels of 5mC in regenerated internodes.ConclusionsDNA methylation varies in a highly gene- and chromosome-differential manner during in vitro differentiation and regeneration. 5mC in redifferentiated tissues was not reset to that in original explants during the study period. Hypermethylation of gene bodies in dedifferentiated cells did not interfere with transcription, and may serve a protective role against activation of abundant transposable elements.

PLOS ONE | 2012

Length Bias Correction in Gene Ontology Enrichment Analysis Using Logistic Regression

Gu Mi; Yanming Di; Sarah C. Emerson; Jason S. Cumbie; Jeff H. Chang

When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called “length bias”, will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.

Statistical Applications in Genetics and Molecular Biology | 2013

Higher order asymptotics for negative binomial regression inferences from RNA-sequencing data

Yanming Di; Sarah C. Emerson; Daniel W. Schafer; Jeffrey A. Kimbrel; Jeff H. Chang

Abstract RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.

G3: Genes, Genomes, Genetics | 2016

Differential Expression of Genes Involved in Host Recognition, Attachment, and Degradation in the Mycoparasite Tolypocladium ophioglossoides

C. Alisha Quandt; Yanming Di; Justin Elser; Pankaj Jaiswal; Joseph W. Spatafora

The ability of a fungus to infect novel hosts is dependent on changes in gene content, expression, or regulation. Examining gene expression under simulated host conditions can explore which genes may contribute to host jumping. Insect pathogenesis is the inferred ancestral character state for species of Tolypocladium, however several species are parasites of truffles, including Tolypocladium ophioglossoides. To identify potentially crucial genes in this interkingdom host switch, T. ophioglossoides was grown on four media conditions: media containing the inner and outer portions of its natural host (truffles of Elaphomyces), cuticles from an ancestral host (beetle), and a rich medium (Yeast Malt). Through high-throughput RNASeq of mRNA from these conditions, many differentially expressed genes were identified in the experiment. These included PTH11-related G-protein-coupled receptors (GPCRs) hypothesized to be involved in host recognition, and also found to be upregulated in insect pathogens. A divergent chitinase with a signal peptide was also found to be highly upregulated on media containing truffle tissue, suggesting an exogenous degradative activity in the presence of the truffle host. The adhesin gene, Mad1, was highly expressed on truffle media as well. A BiNGO analysis of overrepresented GO terms from genes expressed during each growth condition found that genes involved in redox reactions and transmembrane transport were the most overrepresented during T. ophioglossoides growth on truffle media, suggesting their importance in growth on fungal tissue as compared to other hosts and environments. Genes involved in secondary metabolism were most highly expressed during growth on insect tissue, suggesting that their products may not be necessary during parasitism of Elaphomyces. This study provides clues into understanding genetic mechanisms underlying the transition from insect to truffle parasitism.

Genes | 2011

RNA-Seq for Plant Pathogenic Bacteria

Jeffrey A. Kimbrel; Yanming Di; Jason S. Cumbie; Jeff H. Chang

The throughput and single-base resolution of RNA-Sequencing (RNA-Seq) have contributed to a dramatic change in transcriptomic-based inquiries and resulted in many new insights into the complexities of bacterial transcriptomes. RNA-Seq could contribute to similar advances in our understanding of plant pathogenic bacteria but it is still a technology under development with limitations and unknowns that need to be considered. Here, we review some new developments for RNA-Seq and highlight recent findings for host-associated bacteria. We also discuss the technical and statistical challenges in the practical application of RNA-Seq for studying bacterial transcriptomes and describe some of the currently available solutions.

PLOS ONE | 2015

Goodness-of-Fit Tests and Model Diagnostics for Negative Binomial Regression of RNA Sequencing Data

Gu Mi; Yanming Di; Daniel W. Schafer

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

Human Heredity | 2009

Conditional Tests for Localizing Trait Genes

Yanming Di; E. A. Thompson

Background/Aims: With pedigree data, genetic linkage can be detected using inheritance vector tests, which explore the discrepancy between the posterior distribution of the inheritance vectors given observed trait values and the prior distribution of the inheritance vectors. In this paper, we propose conditional inheritance vector tests for linkage localization. These conditional tests can also be used to detect additional linkage signals in the presence of previously detected causal genes. Methods: For linkage localization, we propose to perform inheritance vector tests conditioning on the inheritance vectors at two positions bounding a test region. We can detect additional linkage signals by conducting a further conditional test in a region with no previously detected genes. We use randomized p values to extend the marginal and conditional tests when the inheritance vectors cannot be completely determined from genetic marker data. Results: We conduct simulation studies to compare and contrast the marginal and the conditional tests and to demonstrate that randomized p values can capture both the significance and the uncertainty in the test results. Conclusions: The simulation results demonstrate that the proposed conditional tests provide useful localization information, and with informative marker data, the uncertainty in randomized marginal and conditional test results is small.

BMC proceedings | 2011

Power of association tests in the presence of multiple causal variants.

Yanming Di; Gu Mi; Luna Sun; Rongrong Dong; Hong Zhu; Lili Peng

We show that the statistical power of a single single-nucleotide polymorphism (SNP) score test for genetic association reflects the cumulative effect of all causal SNPs that are correlated with the test SNP. Statistical significance of a score test can sometimes be explained by the collective effect of weak correlations between the test SNP and multiple causal SNPs. In a finite population, weak but significant correlations between the test SNP and the causal SNPs can arise by chance alone. As a consequence, when a single-SNP score test shows significance, the causal SNPs contributing to the power of the test are not necessarily located near the test SNP, nor do they have to be in linkage disequilibrium with the test SNP. These findings are confirmed with the Genetic Analysis Workshop 17 mini-exome data. The findings of this study highlight the often overlooked importance of long-range and weak linkage disequilibrium in genetic association studies.

Explore More