Stephen J. Coleman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephen J. Coleman is active.

Explore More

Publication

Featured researches published by Stephen J. Coleman.

Nucleic Acids Research | 2010

MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery

Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr A. Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu

The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥75 bp). MapSplice is not dependent on splice site features or intron length, consequently it can detect novel canonical as well as non-canonical splices. MapSplice leverages the quality and diversity of read alignments of a given splice to increase accuracy. We demonstrate that MapSplice achieves higher sensitivity and specificity than TopHat and SpliceMap on a set of simulated RNA-seq data. Experimental studies also support the accuracy of the algorithm. Splice junctions derived from eight breast cancer RNA-seq datasets recapitulated the extensiveness of alternative splicing on a global level as well as the differences between molecular subtypes of breast cancer. These combined results indicate that MapSplice is a highly accurate algorithm for the alignment of RNA-seq reads to splice junctions. Software download URL: http://www.netlab.uky.edu/p/bioinfo/MapSplice.

Animal Genetics | 2010

Structural annotation of equine protein-coding genes determined by mRNA sequencing.

Stephen J. Coleman; Zheng Zeng; Kai Wang; S. Luo; I. Khrebtukova; Michael J. Mienaltowski; G. P. Schroth; Jinze Liu; James N. MacLeod

The horse, like the majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for the majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758105 sequence tags of 35 bases each, equalling 10.28 gbp of total sequence data. The tag alignments represent approximately 207 × coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to refine the structural annotation for 11,356 of these predicted genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq-derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples.

Journal of Virology | 2011

Genome-Wide Association Study among Four Horse Breeds Identifies a Common Haplotype Associated with In Vitro CD3+ T Cell Susceptibility/Resistance to Equine Arteritis Virus Infection

Yun Young Go; Ernest Bailey; Deborah G. Cook; Stephen J. Coleman; James N. MacLeod; Kuey-Chu Chen; Peter J. Timoney; Udeni B.R. Balasuriya

ABSTRACT Previously, we have shown that horses could be divided into susceptible and resistant groups based on an in vitro assay using dual-color flow cytometric analysis of CD3+ T cells infected with equine arteritis virus (EAV). Here, we demonstrate that the differences in in vitro susceptibility of equine CD3+ T lymphocytes to EAV infection have a genetic basis. To investigate the possible hereditary basis for this trait, we conducted a genome-wide association study (GWAS) to compare susceptible and resistant phenotypes. Testing of 267 DNA samples from four horse breeds that had a susceptible or a resistant CD3+ T lymphocyte phenotype using both Illumina Equine SNP50 BeadChip and Sequenoms MassARRAY system identified a common, genetically dominant haplotype associated with the susceptible phenotype in a region of equine chromosome 11 (ECA11), positions 49572804 to 49643932. The presence of a common haplotype indicates that the trait occurred in a common ancestor of all four breeds, suggesting that it may be segregated among other modern horse breeds. Biological pathway analysis revealed several cellular genes within this region of ECA11 encoding proteins associated with virus attachment and entry, cytoskeletal organization, and NF-κB pathways that may be associated with the trait responsible for the in vitro susceptibility/resistance of CD3+ T lymphocytes to EAV infection. The data presented in this study demonstrated a strong association of genetic markers with the trait, representing de facto proof that the trait is under genetic control. To our knowledge, this is the first GWAS of an equine infectious disease and the first GWAS of equine viral arteritis.

Immunogenetics | 2009

Characterization of equine and other vertebrate TLR3, TLR7, and TLR8 genes

Natalia M. Astakhova; Andrey A. Perelygin; Andrey Zharkikh; Teri L. Lear; Stephen J. Coleman; James N. MacLeod; Margo A. Brinton

Toll-like receptors 3, 7, and 8 (TLR3, TLR7, and TLR8) were studied in the genomes of the domestic horse and several other mammals. The messenger RNA sequences and exon/intron structures of these TLR genes were determined. An equine bacterial artificial chromosome clone containing the TLR3 gene was assigned by fluorescent in situ hybridization to the horse chromosomal location ECA27q16–q17 and this map location was confirmed using an equine radiation hybrid panel. Direct sequencing revealed 13 single-nucleotide polymorphisms in the coding regions of the equine TLR 3, 7, and 8 genes. Of these polymorphisms, 12 were not previously reported. The allelic frequency was estimated for each single-nucleotide polymorphism from genotyping data obtained for 154 animals from five horse breeds. Some of these frequencies varied significantly among different horse breeds. Domain architecture predictions for the three equine TLR protein sequences revealed several conserved regions within the variable leucine-rich repeats between the corresponding horse and cattle TLR proteins. A phylogenetic analysis did not indicate that any significant exchanges had occurred between paralogous TLR7 and TLR8 genes in 20 vertebrate species analyzed.

PLOS ONE | 2015

Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads

Jovan D. Rebolledo-Mendez; Matthew S. Hestand; Stephen J. Coleman; Zheng Zeng; Ludovic Orlando; James N. MacLeod; Ted Kalbfleisch

The reference assembly for the domestic horse, EquCab2, published in 2009, was built using approximately 30 million Sanger reads from a Thoroughbred mare named Twilight. Contiguity in the assembly was facilitated using nearly 315 thousand BAC end sequences from Twilight’s half brother Bravo. Since then, it has served as the foundation for many genome-wide analyses that include not only the modern horse, but ancient horses and other equid species as well. As data mapped to this reference has accumulated, consistent variation between mapped datasets and the reference, in terms of regions with no read coverage, single nucleotide variants, and small insertions/deletions have become apparent. In many cases, it is not clear whether these differences are the result of true sequence variation between the research subjects’ and Twilight’s genome or due to errors in the reference. EquCab2 is regarded as “The Twilight Assembly.” The objective of this study was to identify inconsistencies between the EquCab2 assembly and the source Twilight Sanger data used to build it. To that end, the original Sanger and BAC end reads have been mapped back to this equine reference and assessed with the addition of approximately 40X coverage of new Illumina Paired-End sequence data. The resulting mapped datasets identify those regions with low Sanger read coverage, as well as variation in genomic content that is not consistent with either the original Twilight Sanger data or the new genomic sequence data generated from Twilight on the Illumina platform. As the haploid EquCab2 reference assembly was created using Sanger reads derived largely from a single individual, the vast majority of variation detected in a mapped dataset comprised of those same Sanger reads should be heterozygous. In contrast, homozygous variations would represent either errors in the reference or contributions from Bravos BAC end sequences. Our analysis identifies 720,843 homozygous discrepancies between new, high throughput genomic sequence data generated for Twilight and the EquCab2 reference assembly. Most of these represent errors in the assembly, while approximately 10,000 are demonstrated to be contributions from another horse. Other results are presented that include the binary alignment map file of the mapped Sanger reads, a list of variants identified as discrepancies between the source data and resulting reference, and a BED annotation file that lists the regions of the genome whose consensus was likely derived from low coverage alignments.

PLOS ONE | 2013

Analysis of Unannotated Equine Transcripts Identified by mRNA Sequencing

Stephen J. Coleman; Zheng Zeng; Matthew S. Hestand; Jinze Liu; James N. MacLeod

Sequencing of equine mRNA (RNA-seq) identified 428 putative transcripts which do not map to any previously annotated or predicted horse genes. Most of these encode the equine homologs of known protein-coding genes described in other species, yet the potential exists to identify novel and perhaps equine-specific gene structures. A set of 36 transcripts were prioritized for further study by filtering for levels of expression (depth of RNA-seq read coverage), distance from annotated features in the equine genome, the number of putative exons, and patterns of gene expression between tissues. From these, four were selected for further investigation based on predicted open reading frames of greater than or equal to 50 amino acids and lack of detectable homology to known genes across species. Sanger sequencing of RT-PCR amplicons from additional equine samples confirmed expression and structural annotation of each transcript. Functional predictions were made by conserved domain searches. A single transcript, expressed in the cerebellum, contains a putative kruppel-associated box (KRAB) domain, suggesting a potential function associated with zinc finger proteins and transcriptional regulation. Overall levels of conserved synteny and sequence conservation across a 1MB region surrounding each transcript were approximately 73% compared to the human, canine, and bovine genomes; however, the four loci display some areas of low conservation and sequence inversion in regions that immediately flank these previously unannotated equine transcripts. Taken together, the evidence suggests that these four transcripts are likely to be equine-specific.

PLOS ONE | 2015

Annotation of the Protein Coding Regions of the Equine Genome

Matthew S. Hestand; Theodore S. Kalbfleisch; Stephen J. Coleman; Zheng Zeng; Jinze Liu; Ludovic Orlando; James N. MacLeod

Current gene annotation of the horse genome is largely derived from in silico predictions and cross-species alignments. Only a small number of genes are annotated based on equine EST and mRNA sequences. To expand the number of equine genes annotated from equine experimental evidence, we sequenced mRNA from a pool of forty-three different tissues. From these, we derived the structures of 68,594 transcripts. In addition, we identified 301,829 positions with SNPs or small indels within these transcripts relative to EquCab2. Interestingly, 780 variants extend the open reading frame of the transcript and appear to be small errors in the equine reference genome, since they are also identified as homozygous variants by genomic DNA resequencing of the reference horse. Taken together, we provide a resource of equine mRNA structures and protein coding variants that will enhance equine and cross-species transcriptional and genomic comparisons.

BMC Bioinformatics | 2010

Analysis of equine protein-coding gene structure and expression by RNA-sequencing

Stephen J. Coleman; Zheng Zeng; Jinze Liu; James N. MacLeod

Background RNA-sequencing (RNA-seq) data from eight equine tissue samples (34-day whole embryo, full term placental villous, adult testes, adult cerebellum, adult articular cartilage, adult LPS-stimulated articular cartilage, adult synovial membrane, and adult LPS-stimulated synovial membrane) were used to refine the structural annotation of protein-coding genes in the horse and for a preliminary assessment of tissue-specific expression patterns.

PLOS ONE | 2015

Tissue Restricted Splice Junctions Originate Not Only from Tissue-Specific Gene Loci, but Gene Loci with a Broad Pattern of Expression

Matthew S. Hestand; Zheng Zeng; Stephen J. Coleman; Jinze Liu; James N. MacLeod

Cellular mechanisms that achieve protein diversity in eukaryotes are multifaceted, including transcriptional components such as RNA splicing. Through alternative splicing, a single protein-coding gene can generate multiple mRNA transcripts and protein isoforms, some of which are tissue-specific. We have conducted qualitative and quantitative analyses of the Bodymap 2.0 messenger RNA-sequencing data from 16 human tissue samples and identified 209,363 splice junctions. Of these, 22,231 (10.6%) were not previously annotated and 21,650 (10.3%) were expressed in a tissue-restricted pattern. Tissue-restricted alternative splicing was found to be widespread, with approximately 65% of expressed multi-exon genes containing at least one tissue-specific splice junction. Interestingly, we observed many tissue-specific splice junctions not only in genes expressed in one or a few tissues, but also from gene loci with a broad pattern of expression.

PLOS ONE | 2013