Jaebum Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jaebum Kim is active.

Explore More

Publication

Featured researches published by Jaebum Kim.

Nature Genetics | 2012

The yak genome and adaptation to life at high altitude

Qiang Qiu; Guojie Zhang; Tao Ma; Wubin Qian; Wang J; Zhiqiang Ye; Changchang Cao; Quanjun Hu; Jaebum Kim; Denis M. Larkin; Loretta Auvil; Boris Capitanu; Jian Ma; Harris A. Lewin; Xiaoju Qian; Yongshan Lang; Ran Zhou; Lizhong Wang; Kun Wang; Jinquan Xia; Shengguang Liao; Shengkai Pan; Xu Lu; Haolong Hou; Yan Wang; Xuetao Zang; Ye Yin; Hui Ma; Jian Zhang; Zhaofeng Wang

Domestic yaks (Bos grunniens) provide meat and other necessities for Tibetans living at high altitude on the Qinghai-Tibetan Plateau and in adjacent regions. Comparison between yak and the closely related low-altitude cattle (Bos taurus) is informative in studying animal adaptation to high altitude. Here, we present the draft genome sequence of a female domestic yak generated using Illumina-based technology at 65-fold coverage. Genomic comparisons between yak and cattle identify an expansion in yak of gene families related to sensory perception and energy metabolism, as well as an enrichment of protein domains involved in sensing the extracellular environment and hypoxic stress. Positively selected and rapidly evolving genes in the yak lineage are also found to be significantly enriched in functional categories and pathways related to hypoxia and nutrition metabolism. These findings may have important implications for understanding adaptation to high altitude in other animal species and for hypoxia-related diseases in humans.

Nature Communications | 2013

Draft genome sequence of the Tibetan antelope.

Ri-Li Ge; Qingle Cai; Yong-Yi Shen; A. san; Lan Ma; Yong Zhang; Xin Yi; Yan Chen; Lingfeng Yang; Ying Huang; Rongjun He; Yuanyuan Hui; Meirong Hao; Yue Li; Bo Wang; Xiaohua Ou; Jiaohui Xu; Yongfen Zhang; K ui Wu; Chunyu Geng; Wei-Ping Zhou; Taicheng Zhou; David M. Irwin; Yingzhong Yang; Liu Ying; Jaebum Kim; Denis M. Larkin; Jian Ma; Harris A. Lewin; Jinchuan Xing

The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

PLOS Genetics | 2009

Evolution of regulatory sequences in 12 Drosophila species.

Jaebum Kim; Xin-Xin He; Saurabh Sinha

Characterization of the evolutionary constraints acting on cis-regulatory sequences is crucial to comparative genomics and provides key insights on the evolution of organismal diversity. We study the relationships among orthologous cis-regulatory modules (CRMs) in 12 Drosophila species, especially with respect to the evolution of transcription factor binding sites, and report statistical evidence in favor of key evolutionary hypotheses. Binding sites are found to have position-specific substitution rates. However, the selective forces at different positions of a site do not act independently, and the evidence suggests that constraints on sites are often based on their exact binding affinities. Binding site loss is seen to conform to a molecular clock hypothesis. The rate of site loss is transcription factor–specific and depends on the strength of binding and, in some cases, the presence of other binding sites in close proximity. Our analysis is based on a novel computational method for aligning orthologous CRMs on a tree, which rigorously accounts for alignment uncertainties and exploits binding site predictions through a unified probabilistic framework. Finally, we report weak purifying selection on short deletions, providing important clues about overall spatial constraints on CRMs. Our results present a complex picture of regulatory sequence evolution, with substantial plasticity that depends on a number of factors. The insights gained in this study will help us to understand the combinatorial control of gene regulation and how it evolves. They will pave the way for theoretical models that are cognizant of the important determinants of regulatory sequence evolution and will be critical in genome-wide identification of non-coding sequences under purifying or positive selection.

Nature Communications | 2014

Genome-wide adaptive complexes to underground stresses in blind mole rats Spalax

Xiaodong Fang; Eviatar Nevo; Lijuan Han; Erez Y. Levanon; Jing Zhao; Aaron Avivi; Denis M. Larkin; Xuanting Jiang; Sergey Feranchuk; Yabing Zhu; Alla Fishman; Yue Feng; Noa Sher; Zhiqiang Xiong; Thomas Hankeln; Zhiyong Huang; Vera Gorbunova; Lu Zhang; Wei Zhao; Derek E. Wildman; Yingqi Xiong; Andrei V. Gudkov; Qiumei Zheng; Gideon Rechavi; Sanyang Liu; Lily Bazak; Jie Chen; Binyamin A. Knisbacher; Yao Lu; Imad Shams

The blind mole rat (BMR), Spalax galili, is an excellent model for studying mammalian adaptation to life underground and medical applications. The BMR spends its entire life underground, protecting itself from predators and climatic fluctuations while challenging it with multiple stressors such as darkness, hypoxia, hypercapnia, energetics and high pathonecity. Here we sequence and analyse the BMR genome and transcriptome, highlighting the possible genomic adaptive responses to the underground stressors. Our results show high rates of RNA/DNA editing, reduced chromosome rearrangements, an over-representation of short interspersed elements (SINEs) probably linked to hypoxia tolerance, degeneration of vision and progression of photoperiodic perception, tolerance to hypercapnia and hypoxia and resistance to cancer. The remarkable traits of the BMR, together with its genomic and transcriptomic information, enhance our understanding of adaptation to extreme environments and will enable the utilization of BMR models for biomedical research in the fight against cancer, stroke and cardiovascular diseases.

Proceedings of the National Academy of Sciences of the United States of America | 2013

Reference-assisted chromosome assembly

Jaebum Kim; Denis M. Larkin; Qingle Cai; Asan; Yongfen Zhang; Ri-Li Ge; Loretta Auvil; Boris Capitanu; Guojie Zhang; Harris A. Lewin; Jian Ma

One of the most difficult problems in modern genomics is the assembly of full-length chromosomes using next generation sequencing (NGS) data. To address this problem, we developed “reference-assisted chromosome assembly” (RACA), an algorithm to reliably order and orient sequence scaffolds generated by NGS and assemblers into longer chromosomal fragments using comparative genome information and paired-end reads. Evaluation of results using simulated and real genome assemblies indicates that our approach can substantially improve genomes generated by a wide variety of de novo assemblers if a good reference assembly of a closely related species and outgroup genomes are available. We used RACA to reconstruct 60 Tibetan antelope (Pantholops hodgsonii) chromosome fragments from 1,434 SOAPdenovo sequence scaffolds, of which 16 chromosome fragments were homologous to complete cattle chromosomes. Experimental validation by PCR showed that predictions made by RACA are highly accurate. Our results indicate that RACA will significantly facilitate the study of chromosome evolution and genome rearrangements for the large number of genomes being sequenced by NGS that do not have a genetic or physical map.

Genome Research | 2014

Alignathon: A competitive assessment of whole genome alignment methods

Dent Earl; Ngan Nguyen; Glenn Hickey; Robert S. Harris; Stephen Fitzgerald; Kathryn Beal; Seledtsov I; Molodtsov; Brian J. Raney; Hiram Clawson; Jaebum Kim; Carsten Kemena; Jia-Ming Chang; Ionas Erb; Poliakov A; Minmei Hou; Javier Herrero; William Kent; Solovyev; Aaron E. Darling; Jian Ma; Cedric Notredame; Michael Brudno; Inna Dubchak; David Haussler; Benedict Paten

Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

Nucleic Acids Research | 2011

PSAR: measuring multiple sequence alignment reliability by probabilistic sampling

Jaebum Kim; Jian Ma

Multiple sequence alignment, which is of fundamental importance for comparative genomics, is a difficult problem and error-prone. Therefore, it is essential to measure the reliability of the alignments and incorporate it into downstream analyses. We propose a new probabilistic sampling-based alignment reliability (PSAR) score. Instead of relying on heuristic assumptions, such as the correlation between alignment quality and guide tree uncertainty in progressive alignment methods, we directly generate suboptimal alignments from an input multiple sequence alignment by a probabilistic sampling method, and compute the agreement of the input alignment with the suboptimal alignments as the alignment reliability score. We construct the suboptimal alignments by an approximate method that is based on pairwise comparisons between each single sequence and the sub-alignment of the input alignment where the chosen sequence is left out. By using simulation-based benchmarks, we find that our approach is superior to existing ones, supporting that the suboptimal alignments are highly informative source for assessing alignment reliability. We apply the PSAR method to the alignments in the UCSC Genome Browser to measure the reliability of alignments in different types of regions, such as coding exons and conserved non-coding regions, and use it to guide cross-species conservation study.

PLOS Computational Biology | 2010

Functional Characterization of Transcription Factor Motifs Using Cross-species Comparison across Large Evolutionary Distances

Jaebum Kim; Ryan J. Cunningham; Brian James; Stefan Wyder; Joshua D. Gibson; Oliver Niehuis; Evgeny M. Zdobnov; Hugh M. Robertson; Gene E. Robinson; John H. Werren; Saurabh Sinha

We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif–function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.

DNA Research | 2015

Genome-wide analysis of DNA methylation in pigs using reduced representation bisulfite sequencing

Min-Kyeung Choi; Jongin Lee; Min Thong Le; Dinh Truong Nguyen; Suhyun Park; Nagasundarapandian Soundrarajan; Kyle M. Schachtschneider; Jaebum Kim; Jin-Ki Park; Jin-Hoi Kim; Chankyu Park

DNA methylation plays a major role in the epigenetic regulation of gene expression. Although a few DNA methylation profiling studies of porcine genome which is one of the important biomedical models for human diseases have been reported, the available data are still limited. We tried to study methylation patterns of diverse pig tissues as a study of the International Swine Methylome Consortium to generate the swine reference methylome map to extensively evaluate the methylation profile of the pig genome at a single base resolution. We generated and analysed the DNA methylome profiles of five different tissues and a cell line originated from pig. On average, 39.85 and 62.1% of cytosine and guanine dinucleotides (CpGs) of CpG islands and 2 kb upstream of transcription start sites were covered, respectively. We detected a low rate (an average of 1.67%) of non-CpG methylation in the six samples except for the neocortex (2.3%). The observed global CpG methylation patterns of pigs indicated high similarity to other mammals including humans. The percentage of CpG methylation associated with gene features was similar among the tissues but not for a 3D4/2 cell line. Our results provide essential information for future studies of the porcine epigenome.

BMC Bioinformatics | 2010

Towards realistic benchmarks for multiple alignments of non-coding sequences

Jaebum Kim; Saurabh Sinha

BackgroundWith the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks.ResultsWe find that the conventional simulation approach, which relies on empirically estimated values for various parameters such as substitution rate or insertion/deletion rates, is unable to generate synthetic sequences reflecting the broad genomic variation in conservation levels. We tackle this problem with a new method for simulating non-coding sequence evolution, by relying on genome-wide distributions of evolutionary parameters rather than their averages. We then generate synthetic data sets to mimic orthologous sequences from the Drosophila group of species, and show that these data sets truly represent the variability observed in genomic data in terms of the difficulty of the alignment task. This allows us to make significant progress towards estimating the alignment accuracy of current tools in an absolute sense, going beyond only a relative assessment of different tools. We evaluate six widely used multiple alignment tools in the context of Drosophila non-coding sequences, and find the accuracy to be significantly different from previously reported values. Interestingly, the performance of most tools degrades more rapidly when there are more insertions than deletions in the data set, suggesting an asymmetric handling of insertions and deletions, even though none of the evaluated tools explicitly distinguishes these two types of events. We also examine the accuracy of two existing tools for annotating insertions versus deletions, and find their performance to be close to optimal in Drosophila non-coding sequences if provided with the true alignments.ConclusionWe have developed a method to generate benchmarks for multiple alignments of Drosophila non-coding sequences, and shown it to be more realistic than traditional benchmarks. Apart from helping to select the most effective tools, these benchmarks will help practitioners of comparative genomics deal with the effects of alignment errors, by providing accurate estimates of the extent of these errors.

Explore More