Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Giltae Song is active.

Publication


Featured researches published by Giltae Song.


research in computational molecular biology | 2008

Reconstructing the evolutionary history of complex human gene clusters

Yu Zhang; Giltae Song; Tomáš Vinař; Eric D. Green; Adam Siepel; Webb Miller

Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data. This method allows the tempo of large-scale evolutionary events in human gene clusters to be estimated, which in turn will facilitate primate comparative sequencing studies that will aim to reconstruct their evolutionary history more fully.


BMC Bioinformatics | 2011

Evaluation of methods for detecting conversion events in gene clusters

Giltae Song; Chih-Hao Hsu; Cathy Riemer; Webb Miller

BackgroundGene clusters are genetically important, but their analysis poses significant computational challenges. One of the major reasons for these difficulties is gene conversion among the duplicated regions of the cluster, which can obscure their true relationships. Many computational methods for detecting gene conversion events have been released, but their performance has not been assessed for wide deployment in evolutionary history studies due to a lack of accurate evaluation methods.ResultsWe designed a new method that simulates gene cluster evolution, including large-scale events of duplication, deletion, and conversion as well as small mutations. We used this simulation data to evaluate several different programs for detecting gene conversion events.ConclusionsOur evaluation identifies strengths and weaknesses of several methods for detecting gene conversion, which can contribute to more accurate analysis of gene cluster evolution.


Journal of Computational Biology | 2010

CAGE: Combinatorial Analysis of Gene-Cluster Evolution

Giltae Song; Louxin Zhang; Tomas Vinar; Webb Miller

Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to acquire new functions. Current computational methods to extract evolutionary information from sequence data for such clusters are suboptimal, in part because accurate sequence data are often lacking in these genomic regions, making existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters, and evaluate its performance on both simulated data and actual human gene clusters.


BMC Evolutionary Biology | 2011

Conversion events in gene clusters

Giltae Song; Chih-Hao Hsu; Cathy Riemer; Yu Zhang; Hie Lim Kim; Federico G. Hoffmann; Louxin Zhang; Ross C. Hardison; Eric D. Green; Webb Miller

BackgroundGene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments.ResultsTo correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab.ConclusionsThese studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes.


Genome Biology and Evolution | 2012

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

Giltae Song; Cathy Riemer; Benjamin Dickins; Hie Lim Kim; Louxin Zhang; Yu Zhang; Chih-Hao Hsu; Ross C. Hardison; Nisc Comparative Sequencing Program; Eric D. Green; Webb Miller

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologists toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.


Journal of Computational Biology | 2009

Evolutionary History Reconstruction for Mammalian Complex Gene Clusters

Yu Zhang; Giltae Song; Tomas Vinar; Eric D. Green; Adam Siepel; Webb Miller

Clusters of genes that evolved from single progenitors via repeated segmental duplications present significant challenges to the generation of a truly complete human genome sequence. Such clusters can confound both accurate sequence assembly and downstream computational analysis, yet they represent a hotbed of functional innovation, making them of extreme interest. We have developed an algorithm for reconstructing the evolutionary history of gene clusters using only human genomic sequence data, which allows the tempo of large-scale evolutionary events in human gene clusters to be estimated. We further propose an extension of the method to simultaneously reconstructing the evolutionary histories of orthologous gene clusters in multiple primates, which will facilitate primate comparative sequencing studies that aim to reconstruct their evolutionary history more fully.


research in computational molecular biology | 2009

Inferring the Recent Duplication History of a Gene Cluster

Giltae Song; Louxin Zhang; Tomáš Vinař; Webb Miller

Much important evolutionary activity occurs in gene clusters, where a copy of a gene may be free to evolve new functions. Computational methods to extract evolutionary information from sequence data for such clusters are currently imperfect, in part because accurate sequence data are often lacking in these genomic regions, making the existing methods difficult to apply. We describe a new method for reconstructing the recent evolutionary history of gene clusters. The methods performance is evaluated on simulated data and on actual human gene clusters.


PLOS ONE | 2015

Correction: AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

Giltae Song; Benjamin Dickins; Janos Demeter; Stacia R. Engel; Jennifer E. Gallagher; Kisurb Choe; Barbara Dunn; Michael Snyder; J. Michael Cherry

Jennifer Gallagher, Kisurb Choe, and Michael Snyder are missing from the author list. Please view the correct author order, affiliations, and citation here: Giltae Song 1, Benjamin J. A. Dickins 2, Janos Demeter 1, Stacia Engel 1, Jennifer Gallagher 1, Kisurb Choe 1, Barbara Dunn 1, Michael Snyder 1, J. Michael Cherry 1 1 Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America, 2 School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom Song G, Dickins BJA, Demeter J, Engel S, Gallagher J, Choe K, et al. (2015) AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae. PLoS ONE 10(3): e0120671. doi:10.1371/journal.pone.0120671 There are missing Author Contributions. The correct contributions are: Conceived and designed the experiments: GS MS JMC. Performed the experiments: GS JG KC BD. Analyzed the data: GS BJAD JD. Contributed reagents/materials/analysis tools: GS JG KC BD. Wrote the paper: GS BJAD JD SE BD JMC. There is an omission in the Acknowledgments. The following sentence should be included in the Acknowledgments: We thank SGD Project staff for the creation of the high quality and detailed database of S. cerevisiae genes and their products and Webb Miller for helpful comments. Illumina sequencing services were performed by the Stanford Center for Genomics and Personalized Medicine.


research in computational molecular biology | 2009

Reconstructing Histories of Complex Gene Clusters on a Phylogeny

Tomáš Vinař; Broňa Brejová; Giltae Song; Adam Siepel

Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance, since they have been shown to be the source of evolutionary innovation, and have been linked to multiple diseases, including HIV and a variety of cancers. Previously, Zhang et al. (2008) developed an algorithm for reconstructing parsimonious evolutionary histories of such gene clusters, using only human genomic sequence data. In this paper, we propose a probabilistic model for the evolution of gene clusters on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate that our method will be useful in analyzing these valuable new data sets.Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. These clusters are one of the major sources of evolutionary innovation, and they are linked to multiple diseases, including HIV and a variety of cancers. Understanding their evolutionary histories is a key to the application of comparative genomics methods in these regions of the genome. We propose a probabilistic model of gene cluster evolution on a phylogeny, and an MCMC algorithm for reconstruction of duplication histories from genomic sequences in multiple species. Several projects are underway to obtain high quality BAC-based assemblies of duplicated clusters in multiple species, and we anticipate use of our methods in their analysis. Supplementary materials are located at http://compbio.fmph.uniba.sk/suppl/09recombcg/


bioRxiv | 2018

Comprehensive, Integrated, and Phased Whole-Genome Analysis of the Primary ENCODE Cell Line K562

Bo Zhou; Steve S. Ho; Xiaowei Zhu; Xianglong Zhang; Noah Spies; Seunggyu Byeon; Joseph G. Arthur; Reenal Pattni; Noa Ben-Efraim; Michael S. Haney; Rajini R Haraksingh; Giltae Song; Dimitri Perrin; Wing Hung Wong; Alexej Abyzov; Alexander E. Urban

K562 is one of the most widely used cell lines in biomedical research. It is one of three tier-one cell lines of ENCODE, and one of the cell lines most commonly used for large-scale CRISPR/Cas9 geneediting screens. Although the functional genomic and epigenomic characteristics of K562 are extensively studied, its genome sequence has never been comprehensively analyzed, and higher-order structural features of its genome beyond its karyotype were only cursorily known. The high degree of aneuploidy in K562 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from K562 requires an understanding of the cell line’s genome sequence and genome structure. We performed deep short-insert whole-genome sequencing, mate-pair sequencing, linked-read sequencing, karyotyping, and array CGH and used a combination of novel and established computational methods to identify and catalog a wide spectrum of genome sequence variants and genome structural features in K562: copy numbers (CN) by chromosome segments, SNVs and Indels (allele frequency-corrected by CN), phased haplotype blocks (N50 = 2.72 Mb), structural variants (SVs) including complex genomic rearrangements, and novel mobile element insertions. A large fraction of SVs was also phased, sequence assembled, and experimentally validated. Many chromosomes show striking loss of heterozygosity. To demonstrate the utility of this knowledge, we re-analyzed K562 RNA-Seq and whole-genome bisulfite sequencing data to detect and phase allelespecific expression and DNA methylation patterns, respectively. Furthermore, we used the haplotype information to produce a phased CRISPR targeting map, i.e. a catalog of loci where CRISPR guide RNAs will bind in an allele-specific manner. Finally, we show examples where deeper insights into genomic regulatory complexity could be gained by taking knowledge of genomic structural contexts into account. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 and as the basis of advanced analyses of the rich amounts of the functional genomics data produced by ENCODE for K562. It is also an example for advanced, integrated whole-genome sequence and structure analysis, beyond standard short-read/short-insert whole-genome sequencing, of human genomes in general and in particular of cancer genomes with large numbers of complex sequence alterations.

Collaboration


Dive into the Giltae Song's collaboration.

Top Co-Authors

Avatar

Webb Miller

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Yu Zhang

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Adam Siepel

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Chih-Hao Hsu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Eric D. Green

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Louxin Zhang

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Cathy Riemer

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Tomas Vinar

Comenius University in Bratislava

View shared research outputs
Top Co-Authors

Avatar

Tomáš Vinař

Comenius University in Bratislava

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge