Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Depeng Wang is active.

Publication


Featured researches published by Depeng Wang.


Nature Communications | 2016

Long-read sequencing and de novo assembly of a Chinese genome

Lingling Shi; Yunfei Guo; Chengliang Dong; John Huddleston; Hui Yang; Xiaolu Han; Aisi Fu; Quan Li; Na Li; Siyi Gong; Katherine E Lintner; Qiong Ding; Zou Wang; Jiang Hu; Depeng Wang; Feng Wang; Lin Wang; Gholson J. Lyon; Yongtao Guan; Yufeng Shen; Oleg V. Evgrafov; James A. Knowles; Françoise Thibaud-Nissen; Valerie Schneider; Chack Yung Yu; Libing Zhou; Evan E. Eichler; Kf So; Kai Wang

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.


Genome Medicine | 2017

Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing

Qian Liu; Peng Zhang; Depeng Wang; Weihong Gu; Kai Wang

Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.


Molecular Cell | 2018

N6-Methyladenine DNA Modification in the Human Genome

Chuan-Le Xiao; Song Zhu; Minghui He; De Chen; Qian Zhang; Ying Chen; Guoliang Yu; Jinbao Liu; Shang-Qian Xie; Feng Luo; Zhe Liang; Depeng Wang; Xiao-Chen Bo; Xiaofeng Gu; Kai Wang; Guang-Rong Yan

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ∼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.


Scientific Reports | 2017

Hybrid sequencing and map finding (HySeMaFi): optional strategies for extensively deciphering gene splicing and expression in organisms without reference genome

Guogui Ning; Xu Cheng; Ping Luo; Fan Liang; Zhen Wang; Guoliang Yu; Xin Li; Depeng Wang; Manzhu Bao

Using second-generation sequencing (SGS) RNA-Seq strategies, extensive alterative splicing prediction is impractical and high variability of isoforms expression quantification is inevitable in organisms without true reference dataset. we report the development of a novel analysis method, termed hybrid sequencing and map finding (HySeMaFi) which combines the specific strengths of third-generation sequencing (TGS) (PacBio SMRT sequencing) and SGS (Illumina Hi-Seq/MiSeq sequencing) to effectively decipher gene splicing and to reliably estimate the isoforms abundance. Error-corrected long reads from TGS are capable of capturing full length transcripts or as large partial transcript fragments. Both true and false isoforms, from a particular gene, as well as that containing all possible exons, could be generated by employing different assembly methods in SGS. We first develop an effective method which can establish the mapping relationship between the error-corrected long reads and the longest assembled contig in every corresponding gene. According to the mapping data, the true splicing pattern of the genes was reliably detected, and quantification of the isoforms was also effectively determined. HySeMaFi is also the optimal strategy by which to decipher the full exon expression of a specific gene when the longest mapped contigs were chosen as the reference set.


GigaScience | 2017

Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome

Xinhua Fu; Jingjing Li; Yu Tian; Weipeng Quan; Shu Zhang; Qian Liu; Fan Liang; Xinlei Zhu; Liangsheng Zhang; Depeng Wang; Jiang Hu

Abstract Background Fireflies are a family of insects within the beetle order Coleoptera, or winged beetles, and they are one of the most well-known and loved insect species because of their bioluminescence. However, the firefly is in danger of extinction because of the massive destruction of its living environment. In order to improve the understanding of fireflies and protect them effectively, we sequenced the whole genome of the terrestrial firefly Pyrocoelia pectoralis. Findings Here, we developed a highly reliable genome resource for the terrestrial firefly Pyrocoelia pectoralis (E. Oliv., 1883; Coleoptera: Lampyridae) using single molecule real time (SMRT) sequencing on the PacBio Sequel platform. In total, 57.8 Gb of long reads were generated and assembled into a 760.4-Mb genome, which is close to the estimated genome size and covered 98.7% complete and 0.7% partial insect Benchmarking Universal Single-Copy Orthologs. The k-mer analysis showed that this genome is highly heterozygous. However, our long-read assembly demonstrates continuousness with a contig N50 length of 3.04 Mb and the longest contig length of 13.69 Mb. Furthermore, 135 589 SSRs and 341 Mb of repeat sequences were detected. A total of 23 092 genes were predicted; 88.44% of genes were annotated with one or more related functions. Conclusions We assembled a high-quality firefly genome, which will not only provide insights into the conservation and biodiversity of fireflies, but also provide a wealth of information to study the mechanisms of their sexual communication, bio-luminescence, and evolution.


bioRxiv | 2018

Localization of balanced chromosome translocation breakpoints by long-read sequencing on the Oxford Nanopore platform

Liang Hu; Fan Liang; Dehua Cheng; Zhiyuan Zhang; Guoliang Yu; Jianjun Zha; Yang Wang; Feng Wang; Yueqiu Tan; Depeng Wang; Kai Wang; Ge Lin

Structural variants (SVs) in genomes, including translocations, inversions, insertions, deletions and duplications, remain difficult to be detected reliably by traditional genomic technologies. In particular, balanced translocations and inversions cannot be detected by microarrays since they do not alter chromosome copy numbers; they cannot be reliably detected by short-read sequencing either, since many breakpoints are located within repetitive regions of the genome that are unmappable by short reads. However, the detection and the precise localization of breakpoints at the nucleotide level are important to study the genetic causes in patients carrying balanced translocations or inversions. Long-read sequencing techniques, such as the Oxford Nanopore Technology (ONT), may detect these SVs in a more direct, efficient and accurate manner. In this study, we applied whole-genome long-read sequencing on the Oxford Nanopore GridION sequencer to detect the breakpoints from 6 carriers of balanced translocations and one carrier of inversion, where SVs had initially been detected by karyotyping at the chromosome level. The results showed that all the balanced translocations were detected with ∼10X coverage and were consistent with the karyotyping results. PCR and Sanger sequencing confirmed 8 of the 14 breakpoints to single base resolution, yet other breakpoints cannot be refined to single-base due to their localization at highly repetitive regions or pericentromeric regions, or due to the possible presence of local deletions/duplications. Our results indicate that low-coverage whole-genome sequencing is an ideal tool for the precise localization of most translocation breakpoints and may provide haplotype information on the breakpoint-linked SNPs, which may be widely applied in SV detection, therapeutic monitoring, assisted reproduction technology (ART) and preimplantation genetic diagnosis (PGD).


bioRxiv | 2018

Single-molecule optical mapping enables accurate molecular diagnosis of facioscapulohumeral muscular dystrophy (FSHD)

Yi Dai; Pidong Li; Zhiqiang Wang; Fan Liang; Fan Yang; Li Fang; Yu Huang; Shangzhi Huang; Jiapeng Zhou; Depeng Wang; Liying Cui; Kai Wang

Facioscapulohumeral Muscular Dystrophy (FSHD) is a common adult muscular dystrophy in which the muscles of the face, shoulder blades and upper arms are among the most affected. FSHD is the only disease in which “junk” DNA is reactivated to cause disease, and the only known repeat array-related disease where fewer repeats cause disease. More than 95% of FSHD cases are associated with copy number loss of a 3.3kb tandem repeat (D4Z4 repeat) at the subtelomeric chromosomal region 4q35, of which the pathogenic allele contains less than 10 repeats and has a specific genomic configuration called 4qA. Currently, genetic diagnosis of FSHD requires pulsed-field gel electrophoresis followed by Southern blot, which is labor-intensive, semi-quantitative and requires long turnaround time. Here, we developed a novel approach for genetic diagnosis of FSHD, by leveraging Bionano Saphyr single-molecule optical mapping platform. Using a bioinformatics pipeline developed for this assay, we found that the method gives direct quantitative measurement of repeat numbers, can differentiate 4q35 and the highly paralogous 10q26 regions, can determine the 4qA/4qB allelic configuration, and can quantitate levels of post-zygotic mosaicism. We evaluated this approach on 5 patients (including two with post-zygotic mosaicism) and 2 patients (including one with post-zygotic mosaicism) from two separate cohorts, and had complete concordance with Southern blots, but with improved quantification of repeat numbers resolved between haplotypes. We concluded that single-molecule optical mapping is a viable approach for molecular diagnosis of FSHD and may be applied in clinical diagnostic settings once more validations are performed.


BMC Bioinformatics | 2018

NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

Li Fang; Jiang Hu; Depeng Wang; Kai Wang

BackgroundStructural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.ResultsIn this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset.ConclusionsOur results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.


bioRxiv | 2017

NextSV: a computational pipeline for structural variation analysis from low-coverage long-read sequencing

Li Fang; Jiang Hu; Depeng Wang; Kai Wang

Structural variants (SVs) in human genome are implicated in a variety of human diseases. Long-read sequencing (such as those from PacBio) delivers much longer read lengths than short-read sequencing (such as those from Illumina) and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, users are often faced with issues such as what coverage is needed and how to optimally use the aligners and SV callers. Here, we evaluated SV calling performance of three SV calling algorithms (PBHoney-Tails, PBHoney-Spots and Sniffles) under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, at 10X coverage, 76% ~ 84% deletions and 80% ~ 92 % insertions in the gold standard set can be detected by PBHoney-Spots. Combining both PBHoney-Spots and Sniffles greatly increased sensitivity, especially under lower coverages such as 6X. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset with low-coverage whole-genome PacBio sequencing. In addition, to automate SV calling, we developed a computational pipeline called NextSV, which integrates PBhoney and Sniffles and generates the union (high sensitivity) or intersection (high specificity) call sets. Our results provide useful guidelines for SV identification from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis on SVs on long-read sequencing data.Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, users are often faced with issues such as what coverage is needed and how to optimally use the aligners and SV callers. Here, we developed NextSV, a meta SV caller and a computational pipeline to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purpose. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 value) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5%~94.1% for deletions and 87.9%~93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.Background Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5% to 94.1% for deletions and 87.9% to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.


bioRxiv | 2016

Evaluation on Efficient Detection of Structural Variants at Low Coverage by Long-Read Sequencing

Li Fang; Jiang Hu; Depeng Wang; Kai Wang

Structural variants (SVs) in human genome are implicated in a variety of human diseases. Long-read sequencing (such as those from PacBio) delivers much longer read lengths than short-read sequencing (such as those from Illumina) and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, users are often faced with issues such as what coverage is needed and how to optimally use the aligners and SV callers. Here, we evaluated SV calling performance of three SV calling algorithms (PBHoney-Tails, PBHoney-Spots and Sniffles) under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, at 10X coverage, 76% ~ 84% deletions and 80% ~ 92 % insertions in the gold standard set can be detected by PBHoney-Spots. Combining both PBHoney-Spots and Sniffles greatly increased sensitivity, especially under lower coverages such as 6X. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset with low-coverage whole-genome PacBio sequencing. In addition, to automate SV calling, we developed a computational pipeline called NextSV, which integrates PBhoney and Sniffles and generates the union (high sensitivity) or intersection (high specificity) call sets. Our results provide useful guidelines for SV identification from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis on SVs on long-read sequencing data.Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, users are often faced with issues such as what coverage is needed and how to optimally use the aligners and SV callers. Here, we developed NextSV, a meta SV caller and a computational pipeline to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purpose. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 value) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5%~94.1% for deletions and 87.9%~93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.Background Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5% to 94.1% for deletions and 87.9% to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.

Collaboration


Dive into the Depeng Wang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Li Fang

Children's Hospital of Philadelphia

View shared research outputs
Top Co-Authors

Avatar

Feng Wang

Wuhan Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ge Lin

Central South University

View shared research outputs
Top Co-Authors

Avatar

Liangsheng Zhang

Fujian Agriculture and Forestry University

View shared research outputs
Top Co-Authors

Avatar

Lin Wang

Huazhong University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xiaofeng Gu

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Zhe Liang

National University of Singapore

View shared research outputs
Top Co-Authors

Avatar

Bodi Gao

Central South University

View shared research outputs
Researchain Logo
Decentralizing Knowledge