Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jue Ruan is active.

Publication


Featured researches published by Jue Ruan.


Bioinformatics | 2009

The Sequence Alignment/Map format and SAMtools

Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor T. Marth; Gonçalo R. Abecasis; Richard Durbin

Summary: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. Availability: http://samtools.sourceforge.net Contact: [email protected]


Genome Research | 2008

Mapping short DNA sequencing reads and calling variants using mapping quality scores

Heng Li; Jue Ruan; Richard Durbin

New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.


Genome Research | 2010

De novo assembly of human genomes with massively parallel short read sequencing

Ruiqiang Li; Hongmei Zhu; Jue Ruan; Wubin Qian; Xiaodong Fang; Zhongbin Shi; Yingrui Li; Shengting Li; Gao Shan; Karsten Kristiansen; Songgang Li; Huanming Yang; Jian Wang; Jun Wang

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.


Nature | 2008

The diploid genome sequence of an Asian individual

Jun Wang; Wei Wang; Ruiqiang Li; Yingrui Li; Geng Tian; Laurie Goodman; Wei Fan; Junqing Zhang; Jun Li; Juanbin Zhang; Yiran Guo; Binxiao Feng; Heng Li; Yao Lu; Xiaodong Fang; Huiqing Liang; Z. Du; Dong Li; Yiqing Zhao; Yujie Hu; Zhenzhen Yang; Hancheng Zheng; Ines Hellmann; Michael Inouye; John E. Pool; Xin Yi; Jing Zhao; Jinjie Duan; Yan Zhou; Junjie Qin

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual’s genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.


Nucleic Acids Research | 2006

TreeFam: a curated database of phylogenetic trees of animal gene families

Heng Li; Avril Coghlan; Jue Ruan; Lachlan Coin; Jean-Karim Hériché; Lara Osmotherly; Ruiqiang Li; Tao Liu; Zhang Zhang; Lars Bolund; Gane Ka-Shu Wong; Wei-Mou Zheng; Paramvir Dehal; Jun Wang; Richard Durbin

TreeFam is a database of phylogenetic trees of gene families found in animals. It aims to develop a curated resource that presents the accurate evolutionary history of all animal gene families, as well as reliable ortholog and paralog assignments. Curated families are being added progressively, based on seed alignments and trees in a similar fashion to Pfam. Release 1.1 of TreeFam contains curated trees for 690 families and automatically generated trees for another 11 646 families. These represent over 128 000 genes from nine fully sequenced animal genomes and over 45 000 other animal proteins from UniProt; ∼40–85% of proteins encoded in the fully sequenced animal genomes are included in TreeFam. TreeFam is freely available at and .


Nucleic Acids Research | 2007

TreeFam: 2008 Update

Jue Ruan; Heng Li; Zhongzhong Chen; Avril Coghlan; Lachlan Coin; Yiran Guo; Jean-Karim Hériché; Yafeng Hu; Karsten Kristiansen; Ruiqiang Li; Tao Liu; Alan M. Moses; Junjie Qin; Søren Vang; Albert J. Vilella; Abel Ureta-Vidal; Lars Bolund; Jun Wang; Richard Durbin

TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database.


Nucleic Acids Research | 2004

SilkDB: a knowledgebase for silkworm biology and genomics

Jing Wang; Qingyou Xia; Ximiao He; Mingtao Dai; Jue Ruan; Jie Chen; Guo Yu; Haifeng Yuan; Yafeng Hu; Ruiqiang Li; Tao Feng; Chen Ye; Cheng Lu; Jun Wang; Songgang Li; Gane Ka-Shu Wong; Huanming Yang; Jian Wang; Zhonghuai Xiang; Zeyang Zhou; Jun Yu

The Silkworm Knowledgebase (SilkDB) is a web-based repository for the curation, integration and study of silkworm genetic and genomic data. With the recent accomplishment of a ∼6X draft genome sequence of the domestic silkworm (Bombyx mori), SilkDB provides an integrated representation of the large-scale, genome-wide sequence assembly, cDNAs, clusters of expressed sequence tags (ESTs), transposable elements (TEs), mutants, single nucleotide polymorphisms (SNPs) and functional annotations of genes with assignments to InterPro domains and Gene Ontology (GO) terms. SilkDB also hosts a set of ESTs from Bombyx mandarina, a wild progenitor of B.mori, and a collection of genes from other Lepidoptera. Comparative analysis results between the domestic and wild silkworm, between B.mori and other Lepidoptera, and between B.mori and the two sequenced insects, fruitfly and mosquito, are displayed by using B.mori genome sequence as a reference framework. Designed as a basic platform, SilkDB strives to provide a comprehensive knowledgebase about the silkworm and present the silkworm genome and related information in systematic and graphical ways for the convenience of in-depth comparative studies. SilkDB is publicly accessible at http://silkworm.genomics.org.cn.


Proceedings of the National Academy of Sciences of the United States of America | 2011

Rapid growth of a hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data

Yong Tao; Jue Ruan; Shiou-Hwei Yeh; Xuemei Lu; Yu Wang; Weiwei Zhai; Jun Cai; Shaoping Ling; Qiang Gong; Zecheng Chong; Zhengzhong Qu; Qianqian Li; Jiang Liu; Jin Yang; Caihong Zheng; Changqing Zeng; Hurng-Yi Wang; Jing Zhang; Sheng-Han Wang; Lingtong Hao; Lili Dong; Wenjie Li; Min Sun; Wei Zou; Caixia Yu; Chaohua Li; Guojing Liu; Lan Jiang; Jin Xu; Huanwei Huang

We present the analysis of the evolution of tumors in a case of hepatocellular carcinoma. This case is particularly informative about cancer growth dynamics and the underlying driving mutations. We sampled nine different sections from three tumors and seven more sections from the adjacent nontumor tissues. Selected sections were subjected to exon as well as whole-genome sequencing. Putative somatic mutations were then individually validated across all 9 tumor and 7 nontumor sections. Among the mutations validated, 24 were amino acid changes; in addition, 22 large indels/copy number variants (>1 Mb) were detected. These somatic mutations define four evolutionary lineages among tumor cells. Separate evolution and expansion of these lineages were recent and rapid, each apparently having only one lineage-specific protein-coding mutation. Hence, by using a cell-population genetic definition, this approach identified three coding changes (CCNG1, P62, and an indel/fusion gene) as tumor driver mutations. These three mutations, affecting cell cycle control and apoptosis, are functionally distinct from mutations that accumulated earlier, many of which are involved in inflammation/immunity or cell anchoring. These distinct functions of mutations at different stages may reflect the genetic interactions underlying tumor growth.


Bioinformatics | 2012

Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads

Zechen Chong; Jue Ruan; Chung-I Wu

MOTIVATION The innovation of restriction-site associated DNA sequencing (RAD-seq) method takes full advantage of next-generation sequencing technology. By clustering paired-end short reads into groups with their own unique tags, RAD-seq assembly problem is divided into subproblems. Fast and accurately clustering and assembling millions of RAD-seq reads with sequencing errors, different levels of heterozygosity and repetitive sequences is a challenging question. RESULTS Rainbow is developed to provide an ultra-fast and memory-efficient solution to clustering and assembling short reads produced by RAD-seq. First, Rainbow clusters reads using a spaced seed method. Then, Rainbow implements a heterozygote calling like strategy to divide potential groups into haplotypes in a top-down manner. And along a guided tree, it iteratively merges sibling leaves in a bottom-up manner if they are similar enough. Here, the similarity is defined by comparing the 2nd reads of a RAD segment. This approach tries to collapse heterozygote while discriminate repetitive sequences. At last, Rainbow uses a greedy algorithm to locally assemble merged reads into contigs. Rainbow not only outputs the optimal but also suboptimal assembly results. Based on simulation and a real guppy RAD-seq data, we show that Rainbow is more competent than the other tools in dealing with RAD-seq data. AVAILABILITY Source code in C, Rainbow is freely available at http://sourceforge.net/projects/bio-rainbow/files/


Nucleic Acids Research | 2004

ChickVD: a sequence variation database for the chicken genome

Jing Wang; Ximiao He; Jue Ruan; Mingtao Dai; Jie Chen; Yong Zhang; Yafeng Hu; Chen Ye; Shengting Li; Lijuan Cong; Lin Fang; Bin Liu; Songgang Li; Jian Wang; David W. Burt; Gane Ka-Shu Wong; Jun Yu; Huanming Yang; Jun Wang

Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the ‘Chicken Variation Database’ (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn.

Collaboration


Dive into the Jue Ruan's collaboration.

Top Co-Authors

Avatar

Ruiqiang Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xuemei Lu

Beijing Institute of Genomics

View shared research outputs
Top Co-Authors

Avatar

Hongmei Zhu

Beijing Institute of Genomics

View shared research outputs
Top Co-Authors

Avatar

Huanming Yang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Jian Wang

Guangzhou Medical University

View shared research outputs
Top Co-Authors

Avatar

Jun Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kaile Wang

Beijing Institute of Genomics

View shared research outputs
Top Co-Authors

Avatar

Lan Jiang

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge