Junwei Luo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junwei Luo is active.

Explore More

Publication

Featured researches published by Junwei Luo.

Bioinformatics | 2016

Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm

Huimin Luo; Jianxin Wang; Min Li; Junwei Luo; Xiaoqing Peng; Fang-Xiang Wu; Yi Pan

MOTIVATION Drug repositioning, which aims to identify new indications for existing drugs, offers a promising alternative to reduce the total time and cost of traditional drug development. Many computational strategies for drug repositioning have been proposed, which are based on similarities among drugs and diseases. Current studies typically use either only drug-related properties (e.g. chemical structures) or only disease-related properties (e.g. phenotypes) to calculate drug or disease similarity, respectively, while not taking into account the influence of known drug-disease association information on the similarity measures. RESULTS In this article, based on the assumption that similar drugs are normally associated with similar diseases and vice versa, we propose a novel computational method named MBiRW, which utilizes some comprehensive similarity measures and Bi-Random walk (BiRW) algorithm to identify potential novel indications for a given drug. By integrating drug or disease features information with known drug-disease associations, the comprehensive similarity measures are firstly developed to calculate similarity for drugs and diseases. Then drug similarity network and disease similarity network are constructed, and they are incorporated into a heterogeneous network with known drug-disease interactions. Based on the drug-disease heterogeneous network, BiRW algorithm is adopted to predict novel potential drug-disease associations. Computational experiment results from various datasets demonstrate that the proposed approach has reliable prediction performance and outperforms several recent computational drug repositioning approaches. Moreover, case studies of five selected drugs further confirm the superior performance of our method to discover potential indications for drugs practically. AVAILABILITY AND IMPLEMENTATION http://github.com//bioinfomaticsCSU/MBiRW CONTACT: [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Bioinformatics | 2015

EPGA: de novo assembly using the distributions of reads and insert size

Junwei Luo; Jianxin Wang; Zhen Zhang; Fang-Xiang Wu; Min Li; Yi Pan

MOTIVATION In genome assembly, the primary issue is how to determine upstream and downstream sequence regions of sequence seeds for constructing long contigs or scaffolds. When extending one sequence seed, repetitive regions in the genome always cause multiple feasible extension candidates which increase the difficulty of genome assembly. The universally accepted solution is choosing one based on read overlaps and paired-end (mate-pair) reads. However, this solution faces difficulties with regard to some complex repetitive regions. In addition, sequencing errors may produce false repetitive regions and uneven sequencing depth leads some sequence regions to have too few or too many reads. All the aforementioned problems prohibit existing assemblers from getting satisfactory assembly results. RESULTS In this article, we develop an algorithm, called extract paths for genome assembly (EPGA), which extracts paths from De Bruijn graph for genome assembly. EPGA uses a new score function to evaluate extension candidates based on the distributions of reads and insert size. The distribution of reads can solve problems caused by sequencing errors and short repetitive regions. Through assessing the variation of the distribution of insert size, EPGA can solve problems introduced by some complex repetitive regions. For solving uneven sequencing depth, EPGA uses relative mapping to evaluate extension candidates. On real datasets, we compare the performance of EPGA and other popular assemblers. The experimental results demonstrate that EPGA can effectively obtain longer and more accurate contigs and scaffolds.

Bioinformatics | 2017

BOSS: a novel scaffolding algorithm based on an optimized scaffold graph

Junwei Luo; Jianxin Wang; Zhen Zhang; Min Li; Fang-Xiang Wu

Motivation: While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt scaffold graph approaches. However, due to repetitive regions in genome, sequencing errors and uneven sequencing depth, constructing an accurate scaffold graph is still a challenge task. Results: In this paper, we present a novel algorithm (called BOSS), which employs paired reads for scaffolding. To construct a scaffold graph, BOSS utilizes the distribution of insert size to decide whether an edge between two vertices (contigs) should be added and how an edge should be weighed. Moreover, BOSS adopts an iterative strategy to detect spurious edges whose removal can guarantee no contradictions in the scaffold graph. Based on the scaffold graph constructed, BOSS employs a heuristic algorithm to sort vertices (contigs) and then generates scaffolds. The experimental results demonstrate that BOSS produces more satisfactory scaffolds, compared with other popular scaffolding tools on real sequencing data of four genomes. Availability and Implementation: BOSS is publicly available for download at https://github.com/bioinfomaticsCSU/BOSS. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

bioinformatics and biomedicine | 2015

An efficient method to identify essential proteins for different species by integrating protein subcellular localization information

Xiaoqing Peng; Jianxin Wang; Jiancheng Zhong; Junwei Luo; Yi Pan

Essential proteins are indispensable to maintain life activities in living organisms, and play important roles in the studies of pathology, synthetic biology, and drug design. Many computational methods are employed to identify essential proteins from Protein-protein Interaction Networks (PINs). In this paper, considering the different importance of protein-protein interactions which take place in different subcellular compartments, a Compartment Importance Centrality (CIC) method is proposed to detect essential proteins by integrating protein subcellular localization information. The experiments were carried on four species (Saccharomyces cerevisiae, Homo sapiens, Mus musculus and Drosophila melanogaster), and the performance of CIC was compared with other centrality methods, including the centrality methods solely based on topology and the ones combining both topology and other biological knowledge. The results show that CIC method has better performance to predict essential protein on four species. Furthermore, different from methods which overfits with the features of essential proteins of one species and may perform poor for other species, CIC has a wide applicable scope to identify essential proteins for different species.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2017

ISEA: Iterative Seed-Extension Algorithm for De Novo Assembly Using Paired-End Information and Insert Size Distribution

Min Li; Zhongxiang Liao; Yiming He; Jianxin Wang; Junwei Luo; Yi Pan

The purpose of de novo assembly is to report more contiguous, complete, and less error prone contigs. Thanks to the advent of the next generation sequencing (NGS) technologies, the cost of producing high depth reads is reduced greatly. However, due to the disadvantages of NGS, de novo assembly has to face the difficulties brought by repeat regions, error rate, and low sequencing coverage in some regions. Although many de novo algorithms have been proposed to solve these problems, the de novo assembly still remains a challenge. In this article, we developed an iterative seed-extension algorithm for de novo assembly, called ISEA. To avoid the negative impact induced by error rate, ISEA utilizes reads overlap and paired-end information to correct error reads before assemblying. During extending seeds in a De Bruijn graph, ISEA uses an elaborately designed score function based on paired-end information and the distribution of insert size to solve the repeat region problem. By employing the distribution of insert size, the score function can also reduce the influence of error reads. In scaffolding, ISEA adopts a relaxed strategy to join contigs that were terminated for low coverage during the extension. The performance of ISEA was compared with six previous popular assemblers on four real datasets. The experimental results demonstrate that ISEA can effectively obtain longer and more accurate scaffolds.

Bioinformatics | 2016

Sprites: detection of deletions from sequencing data by re-aligning split reads

Zhen Zhang; Jianxin Wang; Junwei Luo; Xiaojun Ding; Jiancheng Zhong; Jun Wang; Fang-Xiang Wu; Yi Pan

MOTIVATION Advances of next generation sequencing technologies and availability of short read data enable the detection of structural variations (SVs). Deletions, an important type of SVs, have been suggested in association with genetic diseases. There are three types of deletions: blunt deletions, deletions with microhomologies and deletions with microsinsertions. The last two types are very common in the human genome, but they pose difficulty for the detection. Furthermore, finding deletions from sequencing data remains challenging. It is highly appealing to develop sensitive and accurate methods to detect deletions from sequencing data, especially deletions with microhomology and deletions with microinsertion. RESULTS We present a novel method called Sprites (SPlit Read re-alIgnment To dEtect Structural variants) which finds deletions from sequencing data. It aligns a whole soft-clipping read rather than its clipped part to the target sequence, a segment of the reference which is determined by spanning reads, in order to find the longest prefix or suffix of the read that has a match in the target sequence. This alignment aims to solve the problem of deletions with microhomologies and deletions with microinsertions. Using both simulated and real data we show that Sprites performs better on detecting deletions compared with other current methods in terms of F-score. AVAILABILITY AND IMPLEMENTATION Sprites is open source software and freely available at https://github.com/zhangzhen/sprites CONTACT [email protected] data: Supplementary data are available at Bioinformatics online.

Bioinformatics | 2015

EPGA2: memory-efficient de novo assembler

Junwei Luo; Jianxin Wang; Weilong Li; Zhen Zhang; Fang-Xiang Wu; Min Li; Yi Pan

MOTIVATION In genome assembly, as coverage of sequencing and genome size growing, most current softwares require a large memory for handling a great deal of sequence data. However, most researchers usually cannot meet the requirements of computing resources which prevent most current softwares from practical applications. RESULTS In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly. AVAILABILITY AND IMPLEMENTATION EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2.

international symposium on bioinformatics research and applications | 2018

Sprites2: Detection of Deletions Based on an Accurate Alignment Strategy

Zhen Zhang; Jianxin Wang; Junwei Luo; Juan Shang; Min Li; Fang-Xiang Wu; Yi Pan

Since humans are diploid organisms, homozygous and heterozygous deletions are ubiquitous in the human genome. How to distinguish homozygous and heterozygous deletions is an important issue for current structural variation detection tools. Additionally, due to the problems of sequencing errors, micro-homologies and micro-insertions, breakpoint locations identified with common alignment tools which use greedy strategy may not be the true deletion locations, and usually lead to false structural variation detections. In this paper, we propose a deletion detection method called Sprites2. Comparing with Sprites, Sprites2 adds the following novel function modules: (1) Sprites2 takes advantage of the variance of insert size distribution to determine the type of deletions which can enhance the accuracy of deletion calls; (2) Sprites2 uses a novel alignment strategy based on AGE (one algorithm aligning 5’ and 3’ ends between two sequences simultaneously) to locate breakpoints which can solve the problems introduced by sequencing errors, micro-homologies and micro-insertions. For testing the performance of Sprites2, simulated and real datasets are used in our experiments, and some popular structural variation detection tools are compared with Sprites2. The experimental results show that Sprites2 can improve deletion detection performance. Sprites2 is publicly available at https://github.com/zhangzhen/sprites2.

international conference on intelligent computing | 2017

LSLS: A Novel Scaffolding Method Based on Path Extension

Min Li; Li Tang; Zhongxiang Liao; Junwei Luo; Fang-Xiang Wu; Yi Pan; Jianxin Wang

While aiming to determine orientations and orders of fragmented contigs, scaffolding is an essential step of assembly pipelines and can make assembly results more complete. Most existing scaffolding tools adopt the scaffold graph approach. However, constructing an accurate scaffold graph is still a challenge task. Removing potential false relationships is a key to achieve a better scaffolding performance, while most scaffolding approaches neglect the impacts of uneven sequencing depth that may cause more sequencing errors, and finally result in many false relationships. In this paper, we present a new scaffolding method LSLS (Loose-Strict-Loose Scaffolding), which is based on path extension. LSLS uses different strategies to extend paths, which can be more adaptive to different sequencing depths. For the problem of multiple paths, we designed a score function, which is based on the distribution of read pairs, to evaluate the reliability of path candidates and extend them with the paths which have the highest score. Besides, LSLS contains a new gap estimation method, which can estimate gap sizes more precisely. The experiment results on the two standard datasets show that LSLS can get better performance.

Computational Biology and Chemistry | 2017