Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gregory E. Sims is active.

Publication


Featured researches published by Gregory E. Sims.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions

Gregory E. Sims; Se-Ran Jun; Guohong A. Wu; Sung-Hou Kim

For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison—a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared. The optimum FFP method is applicable for comparing whole genomes or large genomic regions even when there are no common genes with high homology. We outline the method in 3 stages: (i) We first show how the optimal resolution range can be determined with English books which have been transformed into long character strings by removing all punctuation and spaces. (ii) Next, we test the robustness of the optimized FFP method at the nucleotide level, using a mutation model with a wide range of base substitutions and rearrangements. (iii) Finally, to illustrate the utility of the method, phylogenies are reconstructed from concatenated mammalian intronic genomes; the FFP derived intronic genome topologies for each l within the optimal range are all very similar. The topology agrees with the established mammalian phylogeny revealing that intron regions contain a similar level of phylogenic signal as do coding regions.


Proceedings of the National Academy of Sciences of the United States of America | 2003

A global representation of the protein fold space

Jingtong Hou; Gregory E. Sims; Chao Zhang; Sung-Hou Kim

One of the principal goals of the structural genomics initiative is to identify the total repertoire of protein folds and obtain a global view of the “protein structure universe.” Here, we present a 3D map of the protein fold space in which structurally related folds are represented by spatially adjacent points. Such a representation reveals a high-level organization of the fold space that is intuitively interpretable. The shape of the fold space and the overall distribution of the folds are defined by three dominant trends: secondary structure class, chain topology, and protein domain size. Random coil-like structures of small proteins and peptides are mapped to a region where the three trends converge, offering an interesting perspective on both the demography of fold space and the evolution of protein structures.


Proceedings of the National Academy of Sciences of the United States of America | 2010

Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution

Se-Ran Jun; Gregory E. Sims; Guohong A. Wu; Sung-Hou Kim

We present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny. Our FFP trees are constructed with whole proteomes of 884 prokaryotes, 16 unicellular eukaryotes, and 2 random sequences. To highlight the branching order of major groups, we present a simplified proteome FFP tree of monophyletic class or phylum with branch support. In our whole-proteome FFP trees (i) Archaea, Bacteria, Eukaryota, and a random sequence outgroup are clearly separated; (ii) Archaea and Bacteria form a sister group when rooted with random sequences; (iii) Planctomycetes, which possesses an intracellular membrane compartment, is placed at the basal position of the Bacteria domain; (iv) almost all groups are monophyletic in prokaryotes at most taxonomic levels, but many differences in the branching order of major groups are observed between our proteome FFP tree and trees built with other methods; and (v) previously “unclassified” genomes may be assigned to the most likely taxa. We describe notable similarities and differences between our FFP trees and those based on other methods in grouping and phylogeny of prokaryotes.


Proceedings of the National Academy of Sciences of the United States of America | 2011

Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

Gregory E. Sims; Sung-Hou Kim

A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Whole-genome phylogeny of mammals: Evolutionary information in genic and nongenic regions

Gregory E. Sims; Se-Ran Jun; Guohong Albert Wu; Sung-Hou Kim

Ten complete mammalian genome sequences were compared by using the “feature frequency profile” (FFP) method of alignment-free comparison. This comparison technique reveals that the whole nongenic portion of mammalian genomes contains evolutionary information that is similar to their genic counterparts—the intron and exon regions. We partitioned the complete genomes of mammals (such as human, chimp, horse, and mouse) into their constituent nongenic, intronic, and exonic components. Phylogenic species trees were constructed for each individual component class of genome sequence data as well as the whole genomes by using standard tree-building algorithms with FFP distances. The phylogenies of the whole genomes and each of the component classes (exonic, intronic, and nongenic regions) have similar topologies, within the optimal feature length range, and all agree well with the evolutionary phylogeny based on a recent large dataset, multispecies, and multigene-based alignment. In the strictest sense, the FFP-based trees are genome phylogenies, not species phylogenies. However, the species phylogeny is highly related to the whole-genome phylogeny. Furthermore, our results reveal that the footprints of evolutionary history are spread throughout the entire length of the whole genome of an organism and are not limited to genes, introns, or short, highly conserved, nongenic sequences that can be adversely affected by factors (such as a choice of sequences, homoplasy, and different mutation rates) resulting in inconsistent species phylogenies.


Proceedings of the National Academy of Sciences of the United States of America | 2009

Whole-proteome phylogeny of large dsDNA virus families by an alignment-free method

Guohong Albert Wu; Se-Ran Jun; Gregory E. Sims; Sung-Hou Kim

The vast sequence divergence among different virus groups has presented a great challenge to alignment-based sequence comparison among different virus families. Using an alignment-free comparison method, we construct the whole-proteome phylogeny for a population of viruses from 11 viral families comprising 142 large dsDNA eukaryote viruses. The method is based on the feature frequency profiles (FFP), where the length of the feature (l-mer) is selected to be optimal for phylogenomic inference. We observe that (i) the FFP phylogeny segregates the population into clades, the membership of each has remarkable agreement with current classification by the International Committee on the Taxonomy of Viruses, with one exception that the mimivirus joins the phycodnavirus family; (ii) the FFP tree detects potential evolutionary relationships among some viral families; (iii) the relative position of the 3 herpesvirus subfamilies in the FFP tree differs from gene alignment-based analysis; (iv) the FFP tree suggests the taxonomic positions of certain “unclassified” viruses; and (v) the FFP method identifies candidates for horizontal gene transfer between virus families.


Proceedings of the National Academy of Sciences of the United States of America | 2006

A method for evaluating the structural quality of protein models by using higher-order φ–ψ pairs scoring

Gregory E. Sims; Sung-Hou Kim


Proceedings of the National Academy of Sciences of the United States of America | 2005

Protein conformational space in higher order ϕ-Ψ maps

Gregory E. Sims; In Geol Choi; Sung-Hou Kim


Nucleic Acids Research | 2003

Global mapping of nucleic acid conformational space: dinucleoside monophosphate conformations and transition pathways among conformational classes

Gregory E. Sims; Sung-Hou Kim


Proceedings of the National Academy of Sciences of the United States of America | 2005

Protein conformational space in higher order phi-Psi maps.

Gregory E. Sims; In Geol Choi; Sung-Hou Kim

Collaboration


Dive into the Gregory E. Sims's collaboration.

Top Co-Authors

Avatar

Sung-Hou Kim

University of California

View shared research outputs
Top Co-Authors

Avatar

Se-Ran Jun

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jingtong Hou

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge