Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Junbo Duan is active.

Publication


Featured researches published by Junbo Duan.


PLOS ONE | 2013

Comparative Studies of Copy Number Variation Detection Methods for Next-Generation Sequencing Technologies

Junbo Duan; Ji-Gang Zhang; Hong-Wen Deng; Yu-Ping Wang

Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method.


IEEE Transactions on Signal Processing | 2011

From Bernoulli–Gaussian Deconvolution to Sparse Signal Restoration

Charles Soussen; Jérôme Idier; David Brie; Junbo Duan

Formulated as a least square problem under an l0 constraint, sparse signal restoration is a discrete optimization problem, known to be NP complete. Classical algorithms include, by increasing cost and efficiency, matching pursuit (MP), orthogonal matching pursuit (OMP), orthogonal least squares (OLS), stepwise regression algorithms and the exhaustive search. We revisit the single most likely replacement (SMLR) algorithm, developed in the mid-1980s for Bernoulli-Gaussian signal restoration. We show that the formulation of sparse signal restoration as a limit case of Bernoulli-Gaussian signal restoration leads to an l0-penalized least square minimization problem, to which SMLR can be straightforwardly adapted. The resulting algorithm, called single best replacement (SBR), can be interpreted as a forward-backward extension of OLS sharing similarities with stepwise regression algorithms. Some structural properties of SBR are put forward. A fast and stable implementation is proposed. The approach is illustrated on two inverse problems involving highly correlated dictionaries. We show that SBR is very competitive with popular sparse algorithms in terms of tradeoff between accuracy and computation time.


NeuroImage | 2014

Sparse representation based biomarker selection for schizophrenia with integrated analysis of fMRI and SNPs.

Hongbao Cao; Junbo Duan; Dongdong Lin; Yin Yao Shugart; Vince D. Calhoun; Yu-Ping Wang

Integrative analysis of multiple data types can take advantage of their complementary information and therefore may provide higher power to identify potential biomarkers that would be missed using individual data analysis. Due to different natures of diverse data modality, data integration is challenging. Here we address the data integration problem by developing a generalized sparse model (GSM) using weighting factors to integrate multi-modality data for biomarker selection. As an example, we applied the GSM model to a joint analysis of two types of schizophrenia data sets: 759,075 SNPs and 153,594 functional magnetic resonance imaging (fMRI) voxels in 208 subjects (92 cases/116 controls). To solve this small-sample-large-variable problem, we developed a novel sparse representation based variable selection (SRVS) algorithm, with the primary aim to identify biomarkers associated with schizophrenia. To validate the effectiveness of the selected variables, we performed multivariate classification followed by a ten-fold cross validation. We compared our proposed SRVS algorithm with an earlier sparse model based variable selection algorithm for integrated analysis. In addition, we compared with the traditional statistics method for uni-variant data analysis (Chi-squared test for SNP data and ANOVA for fMRI data). Results showed that our proposed SRVS method can identify novel biomarkers that show stronger capability in distinguishing schizophrenia patients from healthy controls. Moreover, better classification ratios were achieved using biomarkers from both types of data, suggesting the importance of integrative analysis.


BMC Bioinformatics | 2013

CNV-TV: A robust method to discover copy number variation from short sequencing reads

Junbo Duan; Ji-Gang Zhang; Hong-Wen Deng; Yu-Ping Wang

BackgroundCopy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data.ResultsA novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project.ConclusionThe experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.


BMC Medical Genomics | 2013

Integrating fMRI and SNP data for biomarker identification for schizophrenia with a sparse representation based variable selection method

Hongbao Cao; Junbo Duan; Dongdong Lin; Vince D. Calhoun; Yu-Ping Wang

BackgroundIn recent years, both single-nucleotide polymorphism (SNP) array and functional magnetic resonance imaging (fMRI) have been widely used for the study of schizophrenia (SCZ). In addition, a few studies have been reported integrating both SNPs data and fMRI data for comprehensive analysis.MethodsIn this study, a novel sparse representation based variable selection (SRVS) method has been proposed and tested on a simulation data set to demonstrate its multi-resolution properties. Then the SRVS method was applied to an integrative analysis of two different SCZ data sets, a Single-nucleotide polymorphism (SNP) data set and a functional resonance imaging (fMRI) data set, including 92 cases and 116 controls. Biomarkers for the disease were identified and validated with a multivariate classification approach followed by a leave one out (LOO) cross-validation. Then we compared the results with that of a previously reported sparse representation based feature selection method.ResultsResults showed that biomarkers from our proposed SRVS method gave significantly higher classification accuracy in discriminating SCZ patients from healthy controls than that of the previous reported sparse representation method. Furthermore, using biomarkers from both data sets led to better classification accuracy than using single type of biomarkers, which suggests the advantage of integrative analysis of different types of data.ConclusionsThe proposed SRVS algorithm is effective in identifying significant biomarkers for complicated disease as SCZ. Integrating different types of data (e.g. SNP and fMRI data) may identify complementary biomarkers benefitting the diagnosis accuracy of the disease.


Journal of Bioinformatics and Computational Biology | 2011

A COMPRESSED SENSING BASED APPROACH FOR SUBTYPING OF LEUKEMIA FROM GENE EXPRESSION DATA

Wenlong Tang; Hongbao Cao; Junbo Duan; Yu-Ping Wang

With the development of genomic techniques, the demand for new methods that can handle high-throughput genome-wide data effectively is becoming stronger than ever before. Compressed sensing (CS) is an emerging approach in statistics and signal processing. With the CS theory, a signal can be uniquely reconstructed or approximated from its sparse representations, which can therefore better distinguish different types of signals. However, the application of CS approach to genome-wide data analysis has been rarely investigated. We propose a novel CS-based approach for genomic data classification and test its performance in the subtyping of leukemia through gene expression analysis. The detection of subtypes of cancers such as leukemia according to different genetic markups is significant, which holds promise for the individualization of therapies and improvement of treatments. In our work, four statistical features were employed to select significant genes for the classification. With our selected genes out of 7,129 ones, the proposed CS method achieved a classification accuracy of 97.4% when evaluated with the cross validation and 94.3% when evaluated with another independent data set. The robustness of the method to noise was also tested, giving good performance. Therefore, this work demonstrates that the CS method can effectively detect subtypes of leukemia, implying improved accuracy of diagnosis of leukemia.


Eurasip Journal on Bioinformatics and Systems Biology | 2013

Subtyping glioblastoma by combining miRNA and mRNA expression data using compressed sensing-based approach

Wenlong Tang; Junbo Duan; Ji-Gang Zhang; Yu-Ping Wang

In the clinical practice, many diseases such as glioblastoma, leukemia, diabetes, and prostates have multiple subtypes. Classifying subtypes accurately using genomic data will provide individualized treatments to target-specific disease subtypes. However, it is often difficult to obtain satisfactory classification accuracy using only one type of data, because the subtypes of a disease can exhibit similar patterns in one data type. Fortunately, multiple types of genomic data are often available due to the rapid development of genomic techniques. This raises the question on whether the classification performance can significantly be improved by combining multiple types of genomic data. In this article, we classified four subtypes of glioblastoma multiforme (GBM) with multiple types of genome-wide data (e.g., mRNA and miRNA expression) from The Cancer Genome Atlas (TCGA) project. We proposed a multi-class compressed sensing-based detector (MCSD) for this study. The MCSD was trained with data from TCGA and then applied to subtype GBM patients using an independent testing data. We performed the classification on the same patient subjects with three data types, i.e., miRNA expression data, mRNA (or gene expression) data, and their combinations. The classification accuracy is 69.1% with the miRNA expression data, 52.7% with mRNA expression data, and 90.9% with the combination of both mRNA and miRNA expression data. In addition, some biomarkers identified by the integrated approaches have been confirmed with results from the published literatures. These results indicate that the combined analysis can significantly improve the accuracy of classifying GBM subtypes and identify potential biomarkers for disease diagnosis.


IEEE Transactions on Signal Processing | 2015

Homotopy based algorithms for L0-regularized least-squares

Charles Soussen; Jérôme Idier; Junbo Duan; David Brie

Sparse signal restoration is usually formulated as the minimization of a quadratic cost function |y-Ax ||2<sup>2</sup> where \mbi A is a dictionary and \mbi x is an unknown sparse vector. It is well-known that imposing an ℓ<sub>0</sub> constraint leads to an NP-hard minimization problem. The convex relaxation approach has received considerable attention, where the ℓ<sub>0</sub>-norm is replaced by the ℓ<sub>1</sub>-norm. Among the many effective ℓ<sub>1</sub> solvers, the homotopy algorithm minimizes ||y-Ax ||2<sup>2</sup>+λ||x||<sub>1</sub> with respect to x for a continuum of λs. It is inspired by the piecewise regularity of the ℓ<sub>1</sub>-regularization path, also referred to as the homotopy path. In this paper, we address the minimization problem ||y-Ax||2<sup>2</sup>+λ||x||<sub>0</sub> for a continuum of λs and propose two heuristic search algorithms for ℓ<sub>0</sub>-homotopy. Continuation Single Best Replacement is a forward-backward greedy strategy extending the Single Best Replacement algorithm, previously proposed for ℓ<sub>0</sub>-minimization at a given λ. The adaptive search of the λ-values is inspired by ℓ<sub>1</sub>-homotopy. ℓ<sub>0</sub> Regularization Path Descent is a more complex algorithm exploiting the structural properties of the ℓ<sub>0</sub>-regularization path, which is piecewise constant with respect to λ. Both algorithms are empirically evaluated for difficult inverse problems involving ill-conditioned dictionaries. Finally, we show that they can be easily coupled with usual methods of model order selection.


bioinformatics and biomedicine | 2011

Detection of copy number variation from next generation sequencing data with total variation penalized least square optimization

Junbo Duan; Ji-Gang Zhang; John Lefante; Hong-Wen Deng; Yu-Ping Wang

The detection of copy number variation is important to understand complex diseases such as autism, schizophrenia, cancer, etc. In this paper we propose a method to detect copy number variation from next generation sequencing data. Compared with conventional methods to detect copy number variation like array comparative genomic hybridization (aCGH), the next generation sequencing data provide higher resolution of genomic variations. There are a lot of methods to detect copy number variation from next sequencing data, and most of them are based on statistical hypothesis testing. In this paper, we consider this problem from an optimization point of view. The proposed method is based on optimizing a total variation penalized least square criterion, which involves ℓ-1 norm. Inspired by the analytical study of a statics system, we propose an iterative algorithm to find the optimal solution of this optimization problem. The comparative study with other existing methods on simulated data demonstrates that our method can detect relatively small copy number variants (low copy number and small single copy length) with low false positive rate.


bioinformatics and biomedicine | 2012

Bio marker identification for diagnosis of schizophrenia with integrated analysis of fMRI and SNPs

Hongbao Cao; Dongdong Lin; Junbo Duan; Yu-Ping Wang; Vince D. Calhoun

It is important to identify significant biomarkers such as SNPs for medical diagnosis and treatment. However, the size of a biological sample is usually far less than the number of measurements, which makes the problem more challenging. To overcome this difficulty, we propose a sparse representation based variable selection (SRVS) approach. A simulated data set was first tested to demonstrate the advantages and properties of the proposed method. Then, we applied the algorithm to a joint analysis of 759075 SNPs and 153594 functional magnetic resonance imaging (fMRJ) voxels in 208 subjects (92 cases/116 controls) to identify significant biomarkers for schizophrenia (SZ). When compared with previous studies, our proposed method located 20 genes out of the top 45 SZ genes that are publicly reported We also detected some interesting functional brain regions from the fMRI study. In addition, a leave one out (LOO) cross-validation was performed and the results were compared with that of a previously reported method, which showed that our method gave significantly higher classification accuracy. In addition, the identification accuracy with integrative analysis is much better than that of using single type of data, suggesting that integrative analysis may lead to better diagnostic accuracy by combining complementary SNP and fMRI data.

Collaboration


Dive into the Junbo Duan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dongdong Lin

The Mind Research Network

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David Brie

University of Lorraine

View shared research outputs
Top Co-Authors

Avatar

Mingxi Wan

Xi'an Jiaotong University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge