Is this you? Create Your Porfile

Yunshun Chen

Walter and Eliza Hall Institute of Medical Research

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yunshun Chen is active.

Explore More

Publication

Featured researches published by Yunshun Chen.

Genome Biology | 2014

voom: precision weights unlock linear model analysis tools for RNA-seq read counts

Charity W. Law; Yunshun Chen; Wei Shi; Gordon K. Smyth

New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.

Nucleic Acids Research | 2012

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation

Davis J. McCarthy; Yunshun Chen; Gordon K. Smyth

A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.

Nature Protocols | 2013

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

Simon Anders; Davis J. McCarthy; Yunshun Chen; Michal Okoniewski; Gordon K. Smyth; Wolfgang Huber; Mark D. Robinson

RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4–10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.

Archive | 2014

Differential Expression Analysis of Complex RNA-seq Experiments Using edgeR

Yunshun Chen; Aaron T. L. Lun; Gordon K. Smyth

This article reviews the statistical theory underlying the edgeR software package for differential expression of RNA-seq data. Negative binomial models are used to capture the quadratic mean-variance relationship that can be observed in RNA-seq data. Conditional likelihood methods are used to avoid bias when estimating the level of variation. Empirical Bayes methods are used to allow gene-specific variation estimates even when the number of replicate samples is very small. Generalized linear models are used to accommodate arbitrarily complex designs. A key feature of the edgeR package is the use of weighted likelihood methods to implement a flexible empirical Bayes approach in the absence of easily tractable sampling distributions. The methodology is implemented in flexible software that is easy to use even for users who are not professional statisticians or bioinformaticians. The software is part of the Bioconductor project.

Methods of Molecular Biology | 2016

It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

Aaron T. L. Lun; Yunshun Chen; Gordon K. Smyth

RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

F1000Research | 2016

From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline

Yunshun Chen; Aaron T. L. Lun; Gordon K. Smyth

In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

Proceedings of the National Academy of Sciences of the United States of America | 2014

Regulation of germinal center responses and B-cell memory by the chromatin modifier MOZ

Kim L. Good-Jacobson; Yunshun Chen; Anne K. Voss; Gordon K. Smyth; Tim Thomas; David M. Tarlinton

Significance Understanding the intrinsic mechanisms underlying formation of humoral memory during infection and the ability of the humoral memory population to persist for long periods of time is important for identifying potential targets for improving the efficacy of vaccines. Here we demonstrate that the chromatin modifier MOZ regulates B-cell memory formation, controlling memory compartment composition. This activity of MOZ is B cell-intrinsic and is required for establishing the germinal center gene expression program. IgM memory B cells have been implicated in maintaining the memory B-cell population over time, whereas isotype-switched memory B cells rapidly differentiate into plasmablasts upon secondary infection. Therefore, identifying potential chromatin changes that may induce differentiation into one subset over another makes MOZ a viable target for clinical translation. Memory B cells and long-lived bone marrow-resident plasma cells maintain humoral immunity. Little is known about the intrinsic mechanisms that are essential for forming memory B cells or endowing them with the ability to rapidly differentiate upon reexposure while maintaining the population over time. Histone modifications have been shown to regulate lymphocyte development, but their role in regulating differentiation and maintenance of B-cell subsets during an immune response is unclear. Using stage-specific deletion of monocytic leukemia zinc finger protein (MOZ), a histone acetyltransferase, we demonstrate that mutation of this chromatin modifier alters fate decisions in both primary and secondary responses. In the absence of MOZ, germinal center B cells were significantly impaired in their ability to generate dark zone centroblasts, with a concomitant decrease in both cell-cycle progression and BCL-6 expression. In contrast, there was increased differentiation to IgM and low-affinity IgG1+ memory B cells. The lack of MOZ affected the functional outcome of humoral immune responses, with an increase in secondary germinal centers and a corresponding decrease in secondary high-affinity antibody-secreting cell formation. Therefore, these data provide strong evidence that manipulating epigenetic modifiers can regulate fate decisions during humoral responses, and thus could be targeted for therapeutic intervention.

Breast Cancer Research | 2015

Integration of microRNA signatures of distinct mammary epithelial cell types with their gene expression and epigenetic portraits

Bhupinder Pal; Yunshun Chen; Andrew G. Bert; Yifang Hu; Julie Sheridan; Tamara Beck; Wei Shi; Keith Satterley; Paul R. Jamieson; Gregory J. Goodall; Geoffrey J. Lindeman; Gordon K. Smyth; Jane E. Visvader

IntroductionMicroRNAs (miRNAs) have been implicated in governing lineage specification and differentiation in multiple organs; however, little is known about their specific roles in mammopoiesis. We have determined the global miRNA expression profiles of functionally distinct epithelial subpopulations in mouse and human mammary tissue, and compared these to their cognate transcriptomes and epigenomes. Finally, the human miRNA signatures were used to interrogate the different subtypes of breast cancer, with a view to determining miRNA networks deregulated during oncogenesis.MethodsRNA from sorted mouse and human mammary cell subpopulations was subjected to miRNA expression analysis using the TaqMan MicroRNA Array. Differentially expressed (DE) miRNAs were correlated with gene expression and histone methylation profiles. Analysis of miRNA signatures of the intrinsic subtypes of breast cancer in The Cancer Genome Atlas (TCGA) database versus those of normal human epithelial subpopulations was performed.ResultsUnique miRNA signatures characterized each subset (mammary stem cell (MaSC)/basal, luminal progenitor, mature luminal, stromal), with a high degree of conservation across species. Comparison of miRNA and transcriptome profiles for the epithelial subtypes revealed an inverse relationship and pinpointed key developmental genes. Interestingly, expression of the primate-specific miRNA cluster (19q13.4) was found to be restricted to the MaSC/basal subset. Comparative analysis of miRNA signatures with H3 lysine modification maps of the different epithelial subsets revealed a tight correlation between active or repressive marks for the top DE miRNAs, including derepression of miRNAs in Ezh2-deficient cellular subsets. Interrogation of TCGA-identified miRNA profiles with the miRNA signatures of different human subsets revealed specific relationships.ConclusionsThe derivation of global miRNA expression profiles for the different mammary subpopulations provides a comprehensive resource for understanding the interplay between miRNA networks and target gene expression. These data have highlighted lineage-specific miRNAs and potential miRNA–mRNA networks, some of which are disrupted in neoplasia. Furthermore, our findings suggest that key developmental miRNAs are regulated by global changes in histone modification, thus linking the mammary epigenome with genome-wide changes in the expression of genes and miRNAs. Comparative miRNA signature analyses between normal breast epithelial cells and breast tumors confirmed an important linkage between luminal progenitor cells and basal-like tumors.

Nature Communications | 2017

Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling

Bhupinder Pal; Yunshun Chen; François Vaillant; Paul R. Jamieson; Lavinia Gordon; Anne C. Rios; Stephen Wilcox; Nai Yang Fu; Kevin H. Liu; Felicity C. Jackling; Melissa J. Davis; Geoffrey J. Lindeman; Gordon K. Smyth; Jane E. Visvader

The mammary epithelium comprises two primary cellular lineages, but the degree of heterogeneity within these compartments and their lineage relationships during development remain an open question. Here we report single-cell RNA profiling of mouse mammary epithelial cells spanning four developmental stages in the post-natal gland. Notably, the epithelium undergoes a large-scale shift in gene expression from a relatively homogeneous basal-like program in pre-puberty to distinct lineage-restricted programs in puberty. Interrogation of single-cell transcriptomes reveals different levels of diversity within the luminal and basal compartments, and identifies an early progenitor subset marked by CD55. Moreover, we uncover a luminal transit population and a rare mixed-lineage cluster amongst basal cells in the adult mammary gland. Together these findings point to a developmental hierarchy in which a basal-like gene expression program prevails in the early post-natal gland prior to the specification of distinct lineage signatures, and the presence of cellular intermediates that may serve as transit or lineage-primed cells.The mammary epithelium comprises two cell lineages but the heterogeneity amongst these during development is unclear. Here, the authors report single-cell RNA sequencing of the mouse mammary epithelium at four developmental stages, revealing diversity in both compartments and a transcriptional shift with puberty onset.

PLOS Biology | 2017

Lung Basal Stem Cells Rapidly Repair DNA Damage Using the Error-Prone Nonhomologous End-Joining Pathway

Clare E. Weeden; Yunshun Chen; Stephen Ma; Yifang Hu; Georg Ramm; Kate D. Sutherland; Gordon K. Smyth; Marie-Liesse Asselin-Labat

Lung squamous cell carcinoma (SqCC), the second most common subtype of lung cancer, is strongly associated with tobacco smoking and exhibits genomic instability. The cellular origins and molecular processes that contribute to SqCC formation are largely unexplored. Here we show that human basal stem cells (BSCs) isolated from heavy smokers proliferate extensively, whereas their alveolar progenitor cell counterparts have limited colony-forming capacity. We demonstrate that this difference arises in part because of the ability of BSCs to repair their DNA more efficiently than alveolar cells following ionizing radiation or chemical-induced DNA damage. Analysis of mice harbouring a mutation in the DNA-dependent protein kinase catalytic subunit (DNA-PKcs), a key enzyme in DNA damage repair by nonhomologous end joining (NHEJ), indicated that BSCs preferentially repair their DNA by this error-prone process. Interestingly, polyploidy, a phenomenon associated with genetically unstable cells, was only observed in the human BSC subset. Expression signature analysis indicated that BSCs are the likely cells of origin of human SqCC and that high levels of NHEJ genes in SqCC are correlated with increasing genomic instability. Hence, our results favour a model in which heavy smoking promotes proliferation of BSCs, and their predilection for error-prone NHEJ could lead to the high mutagenic burden that culminates in SqCC. Targeting DNA repair processes may therefore have a role in the prevention and therapy of SqCC.

Explore More