Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Se-Ran Jun is active.

Publication


Featured researches published by Se-Ran Jun.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions

Gregory E. Sims; Se-Ran Jun; Guohong A. Wu; Sung-Hou Kim

For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison—a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared. The optimum FFP method is applicable for comparing whole genomes or large genomic regions even when there are no common genes with high homology. We outline the method in 3 stages: (i) We first show how the optimal resolution range can be determined with English books which have been transformed into long character strings by removing all punctuation and spaces. (ii) Next, we test the robustness of the optimized FFP method at the nucleotide level, using a mutation model with a wide range of base substitutions and rearrangements. (iii) Finally, to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions.


Proceedings of the National Academy of Sciences of the United States of America | 2010

Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution

Se-Ran Jun; Gregory E. Sims; Guohong A. Wu; Sung-Hou Kim

We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously “unclassified” genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions

Gregory E. Sims; Se-Ran Jun; Guohong Albert Wu; Sung-Hou Kim

Ten complete mammalian genome sequences were compared by using the “feature frequency profile” (FFP) method of alignment-free comparison. This comparison technique reveals that the whole nongenic portion of mammalian genomes contains evolutionary information that is similar to their genic counterparts—the intron and exon regions. We partitioned the complete genomes of mammals (such as human, chimp, horse, and mouse) into their constituent nongenic, intronic, and exonic components. Phylogenic species trees were constructed for each individual component class of genome sequence data as well as the whole genomes by using standard tree-building algorithms with FFP distances. The phylogenies of the whole genomes and each of the component classes (exonic, intronic, and nongenic regions) have similar topologies, within the optimal feature length range, and all agree well with the evolutionary phylogeny based on a recent large dataset, multispecies, and multigene-based alignment. In the strictest sense, the FFP-based trees are genome phylogenies, not species phylogenies. However, the species phylogeny is highly related to the whole-genome phylogeny. Furthermore, our results reveal that the footprints of evolutionary history are spread throughout the entire length of the whole genome of an organism and are not limited to genes, introns, or short, highly conserved, nongenic sequences that can be adversely affected by factors (such as a choice of sequences, homoplasy, and different mutation rates) resulting in inconsistent species phylogenies.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method

Guohong Albert Wu; Se-Ran Jun; Gregory E. Sims; Sung-Hou Kim

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based sequence comparison among different virus families. Using an alignment-free comparison method, we construct the whole-proteome phylogeny for a population of viruses from 11 viral families comprising 142 large dsDNA eukaryote viruses. The method is based on the feature frequency profiles (FFP), where the length of the feature (l-mer) is selected to be optimal for phylogenomic inference. We observe that (i) the FFP phylogeny segregates the population into clades, the membership of each has remarkable agreement with current classification by the International Committee on the Taxonomy of Viruses, with one exception that the mimivirus joins the phycodnavirus family; (ii) the FFP tree detects potential evolutionary relationships among some viral families; (iii) the relative position of the 3 herpesvirus subfamilies in the FFP tree differs from gene alignment-based analysis; (iv) the FFP tree suggests the taxonomic positions of certain “unclassified” viruses; and (v) the FFP method identifies candidates for horizontal gene transfer between virus families.


Applied and Environmental Microbiology | 2016

Global Genomic Epidemiology of Salmonella enterica Serovar Typhimurium DT104

Pimlapas Leekitcharoenphon; Rene S. Hendriksen; Simon Le Hello; François-Xavier Weill; Dorte Lau Baggesen; Se-Ran Jun; David W. Ussery; Ole Lund; Derrick W. Crook; Daniel J. Wilson; Frank Møller Aarestrup

ABSTRACT It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant Salmonella enterica serovar Typhimurium DT104 (MDR DT104). Nonetheless, its origin and transmission route have never been revealed. We used whole-genome sequencing (WGS) and temporally structured sequence analysis within a Bayesian framework to reconstruct temporal and spatial phylogenetic trees and estimate the rates of mutation and divergence times of 315 S. Typhimurium DT104 isolates sampled from 1969 to 2012 from 21 countries on six continents. DT104 was estimated to have emerged initially as antimicrobial susceptible in ∼1948 (95% credible interval [CI], 1934 to 1962) and later became MDR DT104 in ∼1972 (95% CI, 1972 to 1988) through horizontal transfer of the 13-kb Salmonella genomic island 1 (SGI1) MDR region into susceptible strains already containing SGI1. This was followed by multiple transmission events, initially from central Europe and later between several European countries. An independent transmission to the United States and another to Japan occurred, and from there MDR DT104 was probably transmitted to Taiwan and Canada. An independent acquisition of resistance genes took place in Thailand in ∼1975 (95% CI, 1975 to 1990). In Denmark, WGS analysis provided evidence for transmission of the organism between herds of animals. Interestingly, the demographic history of Danish MDR DT104 provided evidence for the success of the program to eradicate Salmonella from pig herds in Denmark from 1996 to 2000. The results from this study refute several hypotheses on the evolution of DT104 and suggest that WGS may be useful in monitoring emerging clones and devising strategies for prevention of Salmonella infections.


Frontiers in Microbiology | 2015

Metabolic functions of Pseudomonas fluorescens strains from Populus deltoides depend on rhizosphere or endosphere isolation compartment

Collin M. Timm; Alisha G. Campbell; Sagar M. Utturkar; Se-Ran Jun; Rebecca E. Parales; Watumesa A. Tan; Michael S. Robeson; Tse-Yuan S. Lu; Sara Jawdy; Steven D. Brown; David W. Ussery; Christopher W. Schadt; Gerald A. Tuskan; Mitchel J. Doktycz; David J. Weston; Dale A. Pelletier

The bacterial microbiota of plants is diverse, with 1000s of operational taxonomic units (OTUs) associated with any individual plant. In this work, we used phenotypic analysis, comparative genomics, and metabolic models to investigate the differences between 19 sequenced Pseudomonas fluorescens strains. These isolates represent a single OTU and were collected from the rhizosphere and endosphere of Populus deltoides. While no traits were exclusive to either endosphere or rhizosphere P. fluorescens isolates, multiple pathways relevant for plant-bacterial interactions are enriched in endosphere isolate genomes. Further, growth phenotypes such as phosphate solubilization, protease activity, denitrification and root growth promotion are biased toward endosphere isolates. Endosphere isolates have significantly more metabolic pathways for plant signaling compounds and an increased metabolic range that includes utilization of energy rich nucleotides and sugars, consistent with endosphere colonization. Rhizosphere P. fluorescens have fewer pathways representative of plant-bacterial interactions but show metabolic bias toward chemical substrates often found in root exudates. This work reveals the diverse functions that may contribute to colonization of the endosphere by bacteria and are enriched among closely related isolates.


Applied and Environmental Microbiology | 2016

Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

Se-Ran Jun; Trudy M. Wassenaar; Intawat Nookaew; Loren Hauser; Visanu Wanchai; Miriam Land; Collin M. Timm; Tse-Yuan S. Lu; Christopher W. Schadt; Mitchel J. Doktycz; Dale A. Pelletier; David W. Ussery

ABSTRACT The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.


Fems Microbiology Reviews | 2015

Ebolavirus comparative genomics

Se-Ran Jun; Michael R. Leuze; Intawat Nookaew; Edward C. Uberbacher; Miriam Land; Qian Zhang; Visanu Wanchai; Juanjuan Chai; Morten Nielsen; Thomas Trolle; Ole Lund; Gregory S. Buzard; Thomas Pedersen; Trudy M. Wassenaar; David W. Ussery

The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).


Standards in Genomic Sciences | 2014

Quality scores for 32,000 genomes

Miriam Land; Doug Hyatt; Se-Ran Jun; Guruprasad Kora; Loren Hauser; Oksana Lukjancenko; David W. Ussery

BackgroundMore than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences.ResultsScores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes.ConclusionsThe score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.


BMC Research Notes | 2015

PanFP: pangenome-based functional profiles for microbial communities.

Se-Ran Jun; Michael S. Robeson; Loren Hauser; Christopher W. Schadt; Andrey Gorin

BackgroundFor decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data.ResultsWe present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU’s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome’s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8–0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases.ConclusionsWe developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub (https://github.com/srjun/PanFP).

Collaboration


Dive into the Se-Ran Jun's collaboration.

Top Co-Authors

Avatar

David W. Ussery

University of Arkansas for Medical Sciences

View shared research outputs
Top Co-Authors

Avatar

Intawat Nookaew

University of Arkansas for Medical Sciences

View shared research outputs
Top Co-Authors

Avatar

Trudy M. Wassenaar

Technical University of Denmark

View shared research outputs
Top Co-Authors

Avatar

Sung-Hou Kim

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Loren Hauser

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Visanu Wanchai

University of Arkansas for Medical Sciences

View shared research outputs
Top Co-Authors

Avatar

Christopher W. Schadt

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Michael R. Leuze

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Miriam Land

Oak Ridge National Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge