Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kristian A. Stevens is active.

Publication


Featured researches published by Kristian A. Stevens.


Genome Biology | 2014

Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

David B. Neale; Jill L. Wegrzyn; Kristian A. Stevens; Aleksey V. Zimin; Daniela Puiu; Marc W. Crepeau; Charis Cardeno; Maxim Koriabine; Ann Holtz-Morris; John D. Liechty; Pedro J. Martínez-García; Hans A. Vasquez-Gross; Brian Y. Lin; Jacob J. Zieve; William M. Dougherty; Sara Fuentes-Soriano; Le Shin Wu; Don Gilbert; Guillaume Marçais; Michael Roberts; Carson Holt; Mark Yandell; John M. Davis; Katherine E. Smith; Jeffrey F. D. Dean; W. Walter Lorenz; Ross W. Whetten; Ronald R. Sederoff; Nicholas Wheeler; Patrick E. McGuire

BackgroundThe size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.ResultsWe develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.ConclusionsIn addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.


Genetics | 2012

Genomic Variation in Natural Populations of Drosophila melanogaster

Charles H. Langley; Kristian A. Stevens; Charis Cardeno; Yuh Chwen G. Lee; Daniel R. Schrider; John E. Pool; Sasha A. Langley; Charlyn Suarez; Russell Corbett-Detig; Bryan Kolaczkowski; Shu Fang; Phillip M. Nista; Alisha K. Holloway; Andrew D. Kern; Colin N. Dewey; Yun S. Song; Matthew W. Hahn; David J. Begun

This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5′- and 3′-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species.


PLOS Genetics | 2012

Population Genomics of Sub-Saharan Drosophila melanogaster: African Diversity and Non-African Admixture

John E. Pool; Russell B. Corbett-Detig; Ryuichi P. Sugino; Kristian A. Stevens; Charis Cardeno; Marc W. Crepeau; Pablo Duchen; J. J. Emerson; Perot Saelao; David J. Begun; Charles H. Langley

Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations.


Genetics | 2014

Sequencing and assembly of the 22-gb loblolly pine genome.

Aleksey V. Zimin; Kristian A. Stevens; Marc W. Crepeau; Ann Holtz-Morris; Maxim Koriabine; Guillaume Marçais; Daniela Puiu; Michael Roberts; Jill L. Wegrzyn; Pieter J. de Jong; David B. Neale; James A. Yorke; Charles H. Langley

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.


Genetics | 2014

Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation

Jill L. Wegrzyn; John D. Liechty; Kristian A. Stevens; Le Shin Wu; Carol A. Loopstra; Hans A. Vasquez-Gross; William M. Dougherty; Brian Y. Lin; Jacob J. Zieve; Pedro J. Martínez-García; Carson Holt; Mark Yandell; Aleksey V. Zimin; James A. Yorke; Marc W. Crepeau; Daniela Puiu; Pieter J. de Jong; Keithanne Mockaitis; Doreen Main; Charles H. Langley; David B. Neale

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.


Genetics | 2015

The Drosophila Genome Nexus: A Population Genomic Resource of 623 Drosophila melanogaster Genomes, Including 197 from a Single Ancestral Range Population

Justin B. Lack; Charis Cardeno; Marc W. Crepeau; William Taylor; Russell B. Corbett-Detig; Kristian A. Stevens; Charles H. Langley; John E. Pool

Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.


Genome Research | 2009

BayesCall: A model-based base-calling algorithm for high-throughput short-read sequencing

Wei-Chun Kao; Kristian A. Stevens; Yun S. Song

Extracting sequence information from raw images of fluorescence is the foundation underlying several high-throughput sequencing platforms. Some of the main challenges associated with this technology include reducing the error rate, assigning accurate base-specific quality scores, and reducing the cost of sequencing by increasing the throughput per run. To demonstrate how computational advancement can help to meet these challenges, a novel model-based base-calling algorithm, BayesCall, is introduced for the Illumina sequencing platform. Being founded on the tools of statistical learning, BayesCall is flexible enough to incorporate various features of the sequencing process. In particular, it can easily incorporate time-dependent parameters and model residual effects. This new approach significantly improves the accuracy over Illuminas base-caller Bustard, particularly in the later cycles of a sequencing run. For 76-cycle data on a standard viral sample, phiX174, BayesCall improves Bustards average per-base error rate by approximately 51%. The probability of observing each base can be readily computed in BayesCall, and this probability can be transformed into a useful base-specific quality score with a high discrimination ability. A detailed study of BayesCalls performance is presented here.


Genome Research | 2011

Genome-wide analysis of retrogene polymorphisms in Drosophila melanogaster

Daniel R. Schrider; Kristian A. Stevens; Charis Cardeno; Charles H. Langley; Matthew W. Hahn

Gene duplication via retrotransposition has been shown to be an important mechanism in evolution, affecting gene dosage and allowing for the acquisition of new gene functions. Although fixed retrotransposed genes have been found in a variety of species, very little effort has been made to identify retrogene polymorphisms. Here, we examine 37 Illumina-sequenced North American Drosophila melanogaster inbred lines and present the first ever data set and analysis of polymorphic retrogenes in Drosophila. We show that this type of polymorphism is quite common, with any two gametes in the North American population differing in the presence or absence of six retrogenes, accounting for ~13% of gene copy-number heterozygosity. These retrogenes were identified by a straightforward method that can be applied using any type of DNA sequencing data. We also use a variant of this method to conduct a genome-wide scan for intron presence/absence polymorphisms, and show that any two chromosomes in the population likely differ in the presence of multiple introns. We show that these polymorphisms are all in fact deletions rather than intron gain events present in the reference genome. Finally, by leveraging the known location of the parental genes that give rise to the retrogene polymorphisms, we provide direct evidence that natural selection is responsible for the excess of fixations of retrogenes moving off of the X chromosome in Drosophila. Further efforts to identify retrogene and intron presence/absence polymorphisms will undoubtedly improve our understanding of the evolution of gene copy number and gene structure.


Genetics | 2011

Circumventing heterozygosity: sequencing the amplified genome of a single haploid Drosophila melanogaster embryo.

Charles H. Langley; Marc W. Crepeau; Charis Cardeno; Russell Corbett-Detig; Kristian A. Stevens

Heterozygosity is a major challenge to efficient, high-quality genomic assembly and to the full genomic survey of polymorphism and divergence. In Drosophila melanogaster lines derived from equatorial populations are particularly resistant to inbreeding, thus imposing a major barrier to the determination and analyses of genomic variation in natural populations of this model organism. Here we present a simple genome sequencing protocol based on the whole-genome amplification of the gynogenetically derived haploid genome of a progeny of females mated to males homozygous for the recessive male sterile mutation, ms(3)K81. A single “lane” of paired-end sequences (2 × 76 bp) provides a good syntenic assembly with >95% high-quality coverage (more than five reads). The amplification of the genomic DNA moderately inflates the variation in coverage across the euchromatic portion of the genome. It also increases the frequency of chimeric clones. But the low frequency and random genomic distribution of the chimeric clones limits their impact on the final assemblies. This method provides a solid path forward for population genomic sequencing and offers applications to many other systems in which small amounts of genomic DNA have unique experimental relevance.


Genetics | 2016

Sequence of the Sugar Pine Megagenome.

Kristian A. Stevens; Jill L. Wegrzyn; Aleksey V. Zimin; Daniela Puiu; Marc W. Crepeau; Charis Cardeno; Robin Paul; Daniel Gonzalez-Ibeas; Maxim Koriabine; Ann Holtz-Morris; Pedro J. Martínez-García; Uzay U. Sezen; Guillaume Marçais; Kathy Jermstad; Patrick E. McGuire; Carol A. Loopstra; John M. Davis; Andrew J. Eckert; Pieter J. de Jong; James A. Yorke; David B. Neale; Charles H. Langley

Until very recently, complete characterization of the megagenomes of conifers has remained elusive. The diploid genome of sugar pine (Pinus lambertiana Dougl.) has a highly repetitive, 31 billion bp genome. It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines. The genome represents a unique opportunity to investigate genome “obesity” in conifers and white pines. Comparative analysis of P. lambertiana and P. taeda L. reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size. Like most North American white pines, the principal pathogen of P. lambertiana is white pine blister rust (Cronartium ribicola J.C. Fischer ex Raben.). Identification of candidate genes for resistance to this pathogen is of great ecological importance. The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1. We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping. This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response.

Collaboration


Dive into the Kristian A. Stevens's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David B. Neale

University of California

View shared research outputs
Top Co-Authors

Avatar

Jill L. Wegrzyn

University of Connecticut

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Charis Cardeno

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ann Holtz-Morris

Children's Hospital Oakland Research Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge