Shaoliang Peng
National University of Defense Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shaoliang Peng.
GigaScience | 2012
Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W. Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiao-qian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak Wah Lam; Jun Wang
BackgroundThere is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.FindingsTo overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.ConclusionsBenchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
Genome Biology | 2013
Wenlong Jia; Kunlong Qiu; Minghui He; Pengfei Song; Quan Zhou; Feng Zhou; Yuan Yu; Dandan Zhu; Michael L. Nickerson; Shengqing Wan; Xiangke Liao; Xiaoqian Zhu; Shaoliang Peng; Yingrui Li; Jun Wang; Guangwu Guo
We have developed a new method, SOAPfuse, to identify fusion transcripts from paired-end RNA-Seq data. SOAPfuse applies an improved partial exhaustion algorithm to construct a library of fusion junction sequences, which can be used to efficiently identify fusion events, and employs a series of filters to nominate high-confidence fusion transcripts. Compared with other released tools, SOAPfuse achieves higher detection efficiency and consumed less computing resources. We applied SOAPfuse to RNA-Seq data from two bladder cancer cell lines, and confirmed 15 fusion transcripts, including several novel events common to both cell lines. SOAPfuse is available at http://soap.genomics.org.cn/soapfuse.html.
PLOS ONE | 2013
Ruibang Luo; Thomas K. F. Wong; Jianqiao Zhu; Chi-Man Liu; Xiaoqian Zhu; Edward Wu; Lap-Kei Lee; Haoxiang Lin; Wenjuan Zhu; David W. Cheung; Hing-Fung Ting; Siu-Ming Yiu; Shaoliang Peng; Chang Yu; Yingrui Li; Ruiqiang Li; Tak Wah Lam
To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dps power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.
Nature Communications | 2014
Xiaodong Fang; Eviatar Nevo; Lijuan Han; Erez Y. Levanon; Jing Zhao; Aaron Avivi; Denis M. Larkin; Xuanting Jiang; Sergey Feranchuk; Yabing Zhu; Alla Fishman; Yue Feng; Noa Sher; Zhiqiang Xiong; Thomas Hankeln; Zhiyong Huang; Vera Gorbunova; Lu Zhang; Wei Zhao; Derek E. Wildman; Yingqi Xiong; Andrei V. Gudkov; Qiumei Zheng; Gideon Rechavi; Sanyang Liu; Lily Bazak; Jie Chen; Binyamin A. Knisbacher; Yao Lu; Imad Shams
The blind mole rat (BMR), Spalax galili, is an excellent model for studying mammalian adaptation to life underground and medical applications. The BMR spends its entire life underground, protecting itself from predators and climatic fluctuations while challenging it with multiple stressors such as darkness, hypoxia, hypercapnia, energetics and high pathonecity. Here we sequence and analyse the BMR genome and transcriptome, highlighting the possible genomic adaptive responses to the underground stressors. Our results show high rates of RNA/DNA editing, reduced chromosome rearrangements, an over-representation of short interspersed elements (SINEs) probably linked to hypoxia tolerance, degeneration of vision and progression of photoperiodic perception, tolerance to hypercapnia and hypoxia and resistance to cancer. The remarkable traits of the BMR, together with its genomic and transcriptomic information, enhance our understanding of adaptation to extreme environments and will enable the utilization of BMR models for biomedical research in the fight against cancer, stroke and cardiovascular diseases.
Cancer Research | 2016
Xiangchun Li; William Ka Kei Wu; Rui Xing; Yuexin Liu; Xiaodong Fang; Yanlin Zhang; Mengyao Wang; Jiaqian Wang; Lin Li; Yong Zhou; Senwei Tang; Shaoliang Peng; Kunlong Qiu; Longyun Chen; Kexin Chen; Huanming Yang; Wei Zhang; Matthew T. V. Chan; Youyong Lu; Joseph J.Y. Sung; Jun Yu
Gastric cancer is not a single disease, and its subtype classification is still evolving. Next-generation sequencing studies have identified novel genetic drivers of gastric cancer, but their use as molecular classifiers or prognostic markers of disease outcome has yet to be established. In this study, we integrated somatic mutational profiles and clinicopathologic information from 544 gastric cancer patients from previous genomic studies to identify significantly mutated genes (SMG) with prognostic relevance. Gastric cancer patients were classified into regular (86.8%) and hypermutated (13.2%) subtypes based on mutation burden. Notably, TpCpW mutations occurred significantly more frequently in regular, but not hypermutated, gastric cancers, where they were associated with APOBEC expression. In the former group, six previously unreported (XIRP2, NBEA, COL14A1, CNBD1, ITGAV, and AKAP6) and 12 recurrent mutated genes exhibited high mutation prevalence (≥3.0%) and an unexpectedly higher incidence of nonsynonymous mutations. We also identified two molecular subtypes of regular-mutated gastric cancer that were associated with distinct prognostic outcomes, independently of disease staging, as confirmed in a distinct patient cohort by targeted capture sequencing. Finally, in diffuse-type gastric cancer, CDH1 mutation was found to be associated with shortened patient survival, independently of disease staging. Overall, our work identified previously unreported SMGs and a mutation signature predictive of patient survival in newly classified subtypes of gastric cancer, offering opportunities to stratify patients into optimal treatment plans based on molecular subtyping. Cancer Res; 76(7); 1724-32. ©2016 AACR.
Journal of Physical Chemistry B | 2014
Jinan Wang; Shaoliang Peng; Benjamin P. Cossins; Xiangke Liao; Kaixian Chen; Qiang Shao; Xiaoqian Zhu; Jiye Shi; Weiliang Zhu
The effects of intrinsic structural flexibility of calmodulin protein on the mechanism of its allosteric conformational transition are investigated in this article. Using a novel in silico approach, the conformational transition pathways of intact calmodulin as well as the isolated N- and C- terminal domains are identified and energetically characterized. It is observed that the central α-helix linker amplifies the structural flexibility of intact Ca(2+)-free calmodulin, which might facilitate the transition of the two domains. As a result, the global conformational transition of Ca(2+)-free calmodulin is initiated by the barrierless transition of two domains and proceeds through the barrier associated unwinding and bending of the central α-helix linker. The binding of Ca(2+) cations to calmodulin further increases the structural flexibility of the C-terminal domain and results in a downhill transition pathway of which all regions transit in a concerted manner. On the other hand, the separation of the N- and C-terminal domains from calmodulin protein loses the mediating function of central α-helix linker, leading to more difficult conformational transitions of both domains. The present study provides novel insights into the correlation of the integrity of protein, the structural flexibility, and the mechanism of conformational transition of proteinlike calmodulin.
BMC Bioinformatics | 2016
Haidong Lan; Yuandong Chan; Kai Xu; Bertil Schmidt; Shaoliang Peng; Weiguo Liu
BackgroundComputing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators.ResultsThis paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency.ConclusionsEvaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi.
PACBB | 2014
Yingbo Cui; Xiangke Liao; Xiaoqian Zhu; Bingqiang Wang; Shaoliang Peng
Mapping sequenced reads to a reference genome, also known as sequence reads alignment, is central for sequence analysis. Emerging sequencing technologies such as next generation sequencing (NGS) lead to an explosion of sequencing data, which is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a multi-level parallelization strategy to speed up BWA, a widely used sequence alignment tool and developed our massively parallel sequence aligner: mBWA. mBWA contains two levels of parallelization: firstly, parallelization of data input/output (IO) and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by Intel Many Integrated Core (MIC) coprocessor technology. In this paper, we demonstrate that mBWA outperforms BWA by a combination of those techniques. To the best of our knowledge, mBWA is the first sequence alignment tool to run on Intel MIC and it can achieve more than 5-fold speedup over the original BWA while maintaining the alignment precision.
ieee/acm international symposium cluster, cloud and grid computing | 2015
Qian Cheng; Shaoliang Peng; Yutong Lu; Weiliang Zhu; Zhijian Xu; Xinben Zhang
Molecular docking is a time consuming process, and it requires a substantial amount of computing power. D3DOCkxb was developed for investigating the effects of halogen bond in drug discovery by adding two precise score functions to Auto Dock. The docking accuracy of D3DOCkxb is better than Auto Dock, which can be attributed to a more complicated processing logic of D3DOCkxb. Consequently, it is an even more challenging task to do parallel optimization on D3DOCkxb. In this paper, we developed mD3DOCkxb, a MIC enabled version of D3DOCkxb, which utilizes Intel Xeon Phi, a Many-Integrated Core (MIC) accelerator, to boost the docking performance. We parallelized the Lamarckian Genetic Algorithm (LGA) in D3DOCKxb with OpenMP and port it to MIC with a number of optimization. And 12x to 18x speedup can be achieved, depending on the number of LGA iterations.
Nature Communications | 2015
Xiaodong Fang; Eviatar Nevo; Lijuan Han; Erez Y. Levanon; Jing Zhao; Aaron Avivi; Denis M. Larkin; Xuanting Jiang; Sergey Feranchuk; Yabing Zhu; Alla Fishman; Yue Feng; Noa Sher; Zhiqiang Xiong; Thomas Hankeln; Zhiyong Huang; Vera Gorbunova; Lu Zhang; Wei Zhao; Derek E. Wildman; Yingqi Xiong; Andrei V. Gudkov; Qiumei Zheng; Gideon Rechavi; Sanyang Liu; Lily Bazak; Jie Chen; Binyamin A. Knisbacher; Yao Lu; Imad Shams
Corrigendum: Genome-wide adaptive complexes to underground stresses in blind mole rats Spalax