Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joshua D. Welch is active.

Publication


Featured researches published by Joshua D. Welch.


BMC Genomics | 2009

Word-based characterization of promoters involved in human DNA repair pathways

Jens Lichtenberg; Edwin Jacox; Joshua D. Welch; Kyle Kurz; Xiaoyu Liang; Mary Qu Yang; Frank Drews; Klaus Ecker; Stephen S. Lee; Laura Elnitski; Lonnie R. Welch

BackgroundDNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism.ResultsThe effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions.ConclusionThe word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.


Genome Biology | 2016

SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data

Joshua D. Welch; Alexander J. Hartemink; Jan F. Prins

Single cell experiments provide an unprecedented opportunity to reconstruct a sequence of changes in a biological process from individual “snapshots” of cells. However, nonlinear gene expression changes, genes unrelated to the process, and the possibility of branching trajectories make this a challenging problem. We develop SLICER (Selective Locally Linear Inference of Cellular Expression Relationships) to address these challenges. SLICER can infer highly nonlinear trajectories, select genes without prior knowledge of the process, and automatically determine the location and number of branches and loops. SLICER recovers the ordering of points along simulated trajectories more accurately than existing methods. We demonstrate the effectiveness of SLICER on previously published data from mouse lung cells and neural stem cells.


Nature | 2017

Single-cell transcriptomics reconstructs fate conversion from fibroblast to cardiomyocyte

Ziqing Liu; Li Wang; Joshua D. Welch; Hong Ma; Yang Zhou; Shuo Yu; Joseph Blake Wall; Sahar Alimohamadi; Michael Zheng; Chaoying Yin; Weining Shen; Jan F. Prins; Jiandong Liu; Li Qian

Direct lineage conversion offers a new strategy for tissue regeneration and disease modelling. Despite recent success in directly reprogramming fibroblasts into various cell types, the precise changes that occur as fibroblasts progressively convert to the target cell fates remain unclear. The inherent heterogeneity and asynchronous nature of the reprogramming process renders it difficult to study this process using bulk genomic techniques. Here we used single-cell RNA sequencing to overcome this limitation and analysed global transcriptome changes at early stages during the reprogramming of mouse fibroblasts into induced cardiomyocytes (iCMs). Using unsupervised dimensionality reduction and clustering algorithms, we identified molecularly distinct subpopulations of cells during reprogramming. We also constructed routes of iCM formation, and delineated the relationship between cell proliferation and iCM induction. Further analysis of global gene expression changes during reprogramming revealed unexpected downregulation of factors involved in mRNA processing and splicing. Detailed functional analysis of the top candidate splicing factor, Ptbp1, revealed that it is a critical barrier for the acquisition of cardiomyocyte-specific splicing patterns in fibroblasts. Concomitantly, Ptbp1 depletion promoted cardiac transcriptome acquisition and increased iCM reprogramming efficiency. Additional quantitative analysis of our dataset revealed a strong correlation between the expression of each reprogramming factor and the progress of individual cells through the reprogramming process, and led to the discovery of new surface markers for the enrichment of iCMs. In summary, our single-cell transcriptomics approaches enabled us to reconstruct the reprogramming trajectory and to uncover intermediate cell populations, gene pathways and regulators involved in iCM induction.


Nucleic Acids Research | 2016

Robust detection of alternative splicing in a population of single cells

Joshua D. Welch; Yin Hu; Jan F. Prins

Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3′ bias and technical noise. These unique properties of single cell RNA-seq data make study of alternative splicing difficult, and thus most single cell studies have restricted analysis of transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model to detect genes whose isoform usage shows biological variation significantly exceeding technical noise in a population of single cells. Importantly, SingleSplice is tailored to the unique demands of single cell analysis, detecting isoform usage differences without attempting to infer expression levels for full-length transcripts. Using data from spike-in transcripts, we found that our approach detects variation in isoform usage among single cells with high sensitivity and specificity. We also applied SingleSplice to data from mouse embryonic stem cells and discovered a set of genes that show significant biological variation in isoform usage across the set of cells. A subset of these isoform differences are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle.


BMC Genomics | 2015

Pseudogenes transcribed in breast invasive carcinoma show subtype-specific expression and ceRNA potential

Joshua D. Welch; Jeanette Baran-Gale; Charles M. Perou; Praveen Sethupathy; Jan F. Prins

BackgroundRecent studies have shown that some pseudogenes are transcribed and contribute to cancer when dysregulated. In particular, pseudogene transcripts can function as competing endogenous RNAs (ceRNAs). The high similarity of gene and pseudogene nucleotide sequence has hindered experimental investigation of these mechanisms using RNA-seq. Furthermore, previous studies of pseudogenes in breast cancer have not integrated miRNA expression data in order to perform large-scale analysis of ceRNA potential. Thus, knowledge of both pseudogene ceRNA function and the role of pseudogene expression in cancer are restricted to isolated examples.ResultsTo investigate whether transcribed pseudogenes play a pervasive regulatory role in cancer, we developed a novel bioinformatic method for measuring pseudogene transcription from RNA-seq data. We applied this method to 819 breast cancer samples from The Cancer Genome Atlas (TCGA) project. We then clustered the samples using pseudogene expression levels and integrated sample-paired pseudogene, gene and miRNA expression data with miRNA target prediction to determine whether more pseudogenes have ceRNA potential than expected by chance.ConclusionsOur analysis identifies with high confidence a set of 440 pseudogenes that are transcribed in breast cancer tissue. Of this set, 309 pseudogenes exhibit significant differential expression among breast cancer subtypes. Hierarchical clustering using only pseudogene expression levels accurately separates tumor samples from normal samples and discriminates the Basal subtype from the Luminal and Her2 subtypes. Correlation analysis shows more positively correlated pseudogene-parent gene pairs and negatively correlated pseudogene-miRNA pairs than expected by chance. Furthermore, 177 transcribed pseudogenes possess binding sites for co-expressed miRNAs that are also predicted to target their parent genes. Taken together, these results increase the catalog of putative pseudogene ceRNAs and suggest that pseudogene transcription in breast cancer may play a larger role than previously appreciated.


BMC Bioinformatics | 2010

WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

Jens Lichtenberg; Kyle Kurz; Xiaoyu Liang; Rami Al-ouran; Lev Neiman; Lee J Nau; Joshua D. Welch; Edwin Jacox; Thomas Bitterman; Klaus Ecker; Laura Elnitski; Frank Drews; Stephen S. Lee; Lonnie R. Welch

BackgroundAn important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets.MethodsThis manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model.ResultsA comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center.ConclusionWordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.


Genome Biology | 2017

MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics

Joshua D. Welch; Alexander J. Hartemink; Jan F. Prins

Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single cell measurements. MATCHER uses manifold alignment to infer single cell multi-omic profiles from transcriptomic and epigenetic measurements performed on different cells of the same type. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondences. MATCHER also reveals new insights into the dynamic interplay between the transcriptome and epigenome in single embryonic stem cells and induced pluripotent stem cells.


Nucleic Acids Research | 2016

A subset of replication-dependent histone mRNAs are expressed as polyadenylated RNAs in terminally differentiated tissues

Shawn M. Lyons; Clark H. Cunningham; Joshua D. Welch; Beezly S. Groh; Andrew Y. Guo; Bruce Wei; Michael L. Whitfield; Yue Xiong; William F. Marzluff

Histone proteins are synthesized in large amounts during S-phase to package the newly replicated DNA, and are among the most stable proteins in the cell. The replication-dependent (RD)-histone mRNAs expressed during S-phase end in a conserved stem-loop rather than a polyA tail. In addition, there are replication-independent (RI)-histone genes that encode histone variants as polyadenylated mRNAs. Most variants have specific functions in chromatin, but H3.3 also serves as a replacement histone for damaged histones in long-lived terminally differentiated cells. There are no reported replacement histone genes for histones H2A, H2B or H4. We report that a subset of RD-histone genes are expressed in terminally differentiated tissues as polyadenylated mRNAs, likely serving as replacement histone genes in long-lived non-dividing cells. Expression of two genes, HIST2H2AA3 and HIST1H2BC, is conserved in mammals. They are expressed as polyadenylated mRNAs in fibroblasts differentiated in vitro, but not in serum starved fibroblasts, suggesting that their expression is part of the terminal differentiation program. There are two histone H4 genes and an H3 gene that encode mRNAs that are polyadenylated and expressed at 5- to 10-fold lower levels than the mRNAs from H2A and H2B genes, which may be replacement genes for the H3.1 and H4 proteins.


Nucleic Acids Research | 2016

Selective single cell isolation for genomics using microraft arrays

Joshua D. Welch; Lindsay A. Williams; Matthew DiSalvo; Alicia T. Brandt; Raoud Marayati; Christopher E. Sims; Nancy L. Allbritton; Jan F. Prins; Jen Jen Yeh; Corbin D. Jones

Genomic methods are used increasingly to interrogate the individual cells that compose specific tissues. However, current methods for single cell isolation struggle to phenotypically differentiate specific cells in a heterogeneous population and rely primarily on the use of fluorescent markers. Many cellular phenotypes of interest are too complex to be measured by this approach, making it difficult to connect genotype and phenotype at the level of individual cells. Here we demonstrate that microraft arrays, which are arrays containing thousands of individual cell culture sites, can be used to select single cells based on a variety of phenotypes, such as cell surface markers, cell proliferation and drug response. We then show that a common genomic procedure, RNA-seq, can be readily adapted to the single cells isolated from these rafts. We show that data generated using microrafts and our modified RNA-seq protocol compared favorably with the Fluidigm C1. We then used microraft arrays to select pancreatic cancer cells that proliferate in spite of cytotoxic drug treatment. Our single cell RNA-seq data identified several expected and novel gene expression changes associated with early drug resistance.


2009 Ohio Collaborative Conference on Bioinformatics | 2009

Construction of Genomic Regulatory Encyclopedias: Strategies and Case Studies

Jens Lichtenberg; Mohit Alam; Thomas Bitterman; Frank Drews; Klaus Ecker; Laura Elnitski; Susan Evans; Matt Geisler; Erich Grotewold; Dazhang Gu; Edwin Jacox; Kyle Kurz; Stephen S. Lee; Xiaoyu Liang; Pooja M. Majmudar; Paul Morris; Chase Nelson; Eric Stockinger; Joshua D. Welch; Sarah Wyatt; Alper Yilmaz; Lonnie R. Welch

Encyclopedias of regulatory genomic elements provide a foundation for research in areas such as disease diagnosis, disease treatment, and crop enhancement. The construction of complete encyclopedias of organism-specific genomic elements involved in gene regulation remains a significant challenge. To address this problem, the authors present novel bioinformatics strategies for exploring the word landscapes of putative regulatory regions of genomes. The methods are incorporated into the WordSeeker software tool, which is available at http://word-seeker.org. The effectiveness of these strategies is demonstrated through several case studies.

Collaboration


Dive into the Joshua D. Welch's collaboration.

Top Co-Authors

Avatar

Jan F. Prins

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

William F. Marzluff

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge