Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Baichuan Deng is active.

Publication


Featured researches published by Baichuan Deng.


Analytica Chimica Acta | 2016

Chemometric methods in data processing of mass spectrometry-based metabolomics: A review

Lunzhao Yi; Naiping Dong; Yong-Huan Yun; Baichuan Deng; Dabing Ren; Shao Liu; Yi-Zeng Liang

This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis

Hong-Dong Li; Yi-Zeng Liang; Qing-Song Xu; Dong-Sheng Cao; Bin-Bin Tan; Baichuan Deng; Chen-Chen Lin

Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.eom/p/mia2009/).


Journal of Cheminformatics | 2015

ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation

Jie Dong; Dong Sheng Cao; Hongyu Miao; Shao Liu; Baichuan Deng; Yong Huan Yun; Ning Ning Wang; Ai Ping Lu; Wen Bin Zeng; Alex F. Chen

AbstractBackground Molecular descriptors and fingerprints have been routinely used in QSAR/SAR analysis, virtual drug screening, compound search/ranking, drug ADME/T prediction and other drug discovery processes. Since the calculation of such quantitative representations of molecules may require substantial computational skills and efforts, several tools have been previously developed to make an attempt to ease the process. However, there are still several hurdles for users to overcome to fully harness the power of these tools. First, most of the tools are distributed as standalone software or packages that require necessary configuration or programming efforts of users. Second, many of the tools can only calculate a subset of molecular descriptors, and the results from multiple tools need to be manually merged to generate a comprehensive set of descriptors. Third, some packages only provide application programming interfaces and are implemented in different computer languages, which pose additional challenges to the integration of these tools.Results A freely available web-based platform, named ChemDes, is developed in this study. It integrates multiple state-of-the-art packages (i.e., Pybel, CDK, RDKit, BlueDesc, Chemopy, PaDEL and jCompoundMapper) for computing molecular descriptors and fingerprints. ChemDes not only provides friendly web interfaces to relieve users from burdensome programming work, but also offers three useful and convenient auxiliary tools for format converting, MOPAC optimization and fingerprint similarity calculation. Currently, ChemDes has the capability of computing 3679 molecular descriptors and 59 types of molecular fingerprints.ConclusionChemDes provides users an integrated and friendly tool to calculate various molecular descriptors and fingerprints. It is freely available at http://www.scbdd.com/chemdes. The source code of the project is also available as a supplementary file.


Analytica Chimica Acta | 2015

A new strategy to prevent over-fitting in partial least squares models based on model population analysis

Baichuan Deng; Yong-Huan Yun; Yi-Zeng Liang; Dong-Sheng Cao; Qing-Song Xu; Lunzhao Yi; Xin Huang

Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Qcv(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Qcv(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability.


Analytica Chimica Acta | 2016

A bootstrapping soft shrinkage approach for variable selection in chemical modeling.

Baichuan Deng; Yong-Huan Yun; Dong-Sheng Cao; Yu-Long Yin; Wei-Ting Wang; Hongmei Lu; Qianyi Luo; Yi-Zeng Liang

In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.


RSC Advances | 2015

Application of near infrared spectroscopy for the rapid determination of epimedin A, B, C and icariin in Epimedium

Qianyi Luo; Yong-Huan Yun; Wei Fan; Jianhua Huang; Lixian Zhang; Baichuan Deng; Hongmei Lu

A method for rapid quantitative analysis of epimedin A, B, C and icariin in Epimedium was developed based on Fourier transform near infrared (FT-NIR) spectroscopy, and by adopting high performance liquid chromatography-diode array detection (HPLC-DAD) as the reference method. Multivariate calibrations models were built by partial least squares regression (PLSR) based on the full absorbance spectra (10 000–4000 cm−1) or only the most informative key variables selected by the competitive adaptive reweighted sampling (CARS) method. In comparison, the accuracy of the CARS-PLSR method was apparently higher than full spectrum-PLSR for four kinds of investigated flavonoids. For CARS-PLSR, the coefficients of determination (R2) for prediction were 0.8969, 0.8810, 0.9273 and 0.9325 and the root mean square errors of prediction (RMSEP) were 0.1789, 0.2572, 1.2872 and 0.3615 for epimedin A, B, C and icariin, respectively. The good performance indicates that the combination of NIR spectroscopy with CARS-PLSR is an effective method for determination of epimedin A, B, C and icariin in Epimedium with fast, economic and nondestructive advantages compared to traditional chemical methods.


RSC Advances | 2014

Metabolomic identification of novel biomarkers of nasopharyngeal carcinoma

Lunzhao Yi; Naiping Dong; Shuting Shi; Baichuan Deng; Yong-Huan Yun; Zhibiao Yi; Yi Zhang

This paper introduces a new identification strategy of novel metabolic biomarkers for nasopharyngeal carcinoma (NPC). Here, we combined gas chromatography-mass spectrometry (GC-MS) metabolic profiling with three partial least squares-discriminant analysis (PLS-DA) based variable selection methods to screen the metabolic biomarkers of NPC. We found that the variable importance on projection (VIP) method exhibited better efficiency than the coefficients β and the loadings plot for the metabolomics data set of 39 NPC patients and 40 healthy controls. In addition, we proved that the area under receiver operating characteristic curve (AUC) was more sensitive than the correct rate to evaluate the discrimination ability of the classical models. Therefore, three novel candidate biomarkers, glucose, glutamic acid and pyroglutamate were identified, with a correct rate of 97.47% and an AUC value of 97.40%. Our results suggested that the metabolic disorders of NPC were mainly reflected in the glycolysis and glutamate metabolism; in addition, metabolic levels of the related metabolic pathways may affect each other, such as the TCA cycle and lipid metabolism. We believe that the findings of these novel metabolites will be very helpful for early-diagnosis and subsequent pathogenesis research of NPC.


Journal of Separation Science | 2015

Discrimination of Acori Tatarinowii Rhizoma and Acori Calami Rhizoma based on quantitative gas chromatographic fingerprints and chemometric methods.

Xiaojuan Zhang; Lunzhao Yi; Baichuan Deng; Lian Chen; Shuting Shi; Yongliang Zhuang; Yi Zhang

This study was conducted to investigate the chemical differences between Acori Tatarinowii Rhizoma and Acori Calami Rhizoma using gas chromatography with mass spectrometry and chemometric methods. Quantitative fingerprints were established. A total of 90 volatile compounds were identified and quantified using heuristic evolving latent projection and retention index. An efficient model based on partial least squares-discriminant analysis coupled with variable iterative space shrinkage approach was developed to distinguish Acori Tatarinowii Rhizoma from Acori Calami Rhizoma. The correct rate was 95.83%, and the area under the receiver operating characteristic curve was 100%. Finally, three volatiles, namely, camphor, longicyclene, and δ-cadinene, were selected as key discrimination factors between Acori Tatarinowii Rhizoma and Acori Calami Rhizoma. The proposed protocol can serve as a valid strategy for quality control and screening of potential bioactive components of herbal medicines.


Biotechnology Advances | 2014

WITHDRAWN: Recent advances in chemometric methods for plant metabolomics: A review

Lunzhao Yi; Naiping Dong; Yong-Huan Yun; Baichuan Deng; Shao Liu; Yi Zhang; Yi-Zeng Liang

This article has been withdrawn at the request of the author(s) and/or editor. The Publisher apologizes for any inconvenience this may cause. The full Elsevier Policy on Article Withdrawal can be found at http://www.elsevier.com/locate/withdrawalpolicy.


RSC Advances | 2015

Iteratively variable subset optimization for multivariate calibration

Wei-Ting Wang; Yong-Huan Yun; Baichuan Deng; Wei Fan; Yi-Zeng Liang

Based on the theory that a large partial least squares (PLS) regression coefficient on autoscaled data indicates an important variable, a novel strategy for variable selection called iteratively variable subset optimization (IVSO) is proposed in this study. In addition, we take into consideration that the optimal number of latent variables generated by cross-validation will make a great difference to the regression coefficients and sometimes the difference can even vary by several orders of magnitude. In this work, the regression coefficients generated in every sub-model are normalized to remove the influence. In each iterative round, the regression coefficients of each variable obtained from the sub-models are summed to evaluate their importance level. A two-step procedure including weighted binary matrix sampling (WBMS) and sequential addition is employed to eliminate uninformative variables gradually and gently in a competitive way and reduce the risk of losing important variables. Thus, IVSO can achieve high stability. Investigated by using one simulated dataset and two NIR datasets, IVSO shows much better prediction ability than two other outstanding and commonly used methods, Monte Carlo uninformative variable elimination (MC-UVE) and competitive adaptive reweighted sampling (CARS). The MATLAB code for implementing IVSO is available in the ESI.

Collaboration


Dive into the Baichuan Deng's collaboration.

Top Co-Authors

Avatar

Yi-Zeng Liang

Central South University

View shared research outputs
Top Co-Authors

Avatar

Yong-Huan Yun

Central South University

View shared research outputs
Top Co-Authors

Avatar

Lunzhao Yi

Kunming University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Jinping Deng

South China Agricultural University

View shared research outputs
Top Co-Authors

Avatar

Yulong Yin

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Chengquan Tan

South China Agricultural University

View shared research outputs
Top Co-Authors

Avatar

Dong-Sheng Cao

Central South University

View shared research outputs
Top Co-Authors

Avatar

Hongmei Lu

Central South University

View shared research outputs
Top Co-Authors

Avatar

Qing-Song Xu

Central South University

View shared research outputs
Top Co-Authors

Avatar

Dabing Ren

Kunming University of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge