Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hong-Dong Li is active.

Publication


Featured researches published by Hong-Dong Li.


Journal of Computational Chemistry | 2009

A new strategy of outlier detection for QSAR/QSPR.

Dong-Sheng Cao; Yi-Zeng Liang; Qing-Song Xu; Hong-Dong Li; Xian Chen

The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte‐Carlo cross‐validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross‐predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods.


Analytica Chimica Acta | 2012

Random frog: an efficient reversible jump Markov Chain Monte Carlo-like approach for variable selection with applications to gene selection and disease classification.

Hong-Dong Li; Qing-Song Xu; Yi-Zeng Liang

The identification of disease-relevant genes represents a challenge in microarray-based disease diagnosis where the sample size is often limited. Among established methods, reversible jump Markov Chain Monte Carlo (RJMCMC) methods have proven to be quite promising for variable selection. However, the design and application of an RJMCMC algorithm requires, for example, special criteria for prior distributions. Also, the simulation from joint posterior distributions of models is computationally extensive, and may even be mathematically intractable. These disadvantages may limit the applications of RJMCMC algorithms. Therefore, the development of algorithms that possess the advantages of RJMCMC methods and are also efficient and easy to follow for selecting disease-associated genes is required. Here we report a RJMCMC-like method, called random frog that possesses the advantages of RJMCMC methods and is much easier to implement. Using the colon and the estrogen gene expression datasets, we show that random frog is effective in identifying discriminating genes. The top 2 ranked genes for colon and estrogen are Z50753, U00968, and Y10871_at, Z22536_at, respectively. (The source codes with GNU General Public License Version 2.0 are freely available to non-commercial users at: http://code.google.com/p/randomfrog/.).


Metabolomics | 2010

Recipe for revealing informative metabolites based on model population analysis

Hong-Dong Li; Mao-Mao Zeng; Bin-Bin Tan; Yi-Zeng Liang; Qing-Song Xu; Dong-Sheng Cao

An important application of metabolic profiles is to discover informative metabolites/biomarkers which are predictive of a clinical outcome under investigation. Therefore, there is a need to develop statistically efficient method for screening such kind of metabolites from the candidates. The most commonly used criteria to assess variable (metabolite) importance may be the P value obtained by performing t test on each metabolite alone, without considering the influence of other variables. In this work, a new strategy, called subwindow permutation analysis (SPA) coupled with partial least squares linear discriminant analysis (PLSLDA), is developed for statistical assessment of variable importance. The main contribution of SPA is that, unlike t test, it can output a conditional P value by implicitly taking into account the synergetic effect of all the other variables. In this sense, the conditional P value could to some extent help locate a good combination of informative variables. When applied to two metabolic datasets (type 2 diabetes mellitus data and childhood overweight data), it is shown that the performance of both the unsupervised principal component analysis (PCA) and the supervised PLSLDA are greatly improved when using the informative metabolites revealed by SPA. The source codes for implementing SPA in both MATLAB and R (R package for both Linux and Windows) are freely available at: http://code.google.com/p/spa2010/downloads/list.


Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy | 2013

An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration.

Yong-Huan Yun; Hong-Dong Li; Leslie R. E. Wood; Wei Fan; Jia-Jun Wang; Dong-Sheng Cao; Qing-Song Xu; Yi-Zeng Liang

Wavelength selection is a critical step for producing better prediction performance when applied to spectral data. Considering the fact that the vibrational and rotational spectra have continuous features of spectral bands, we propose a novel method of wavelength interval selection based on random frog, called interval random frog (iRF). To obtain all the possible continuous intervals, spectra are first divided into intervals by moving window of a fix width over the whole spectra. These overlapping intervals are ranked applying random frog coupled with PLS and the optimal ones are chosen. This method has been applied to two near-infrared spectral datasets displaying higher efficiency in wavelength interval selection than others. The source code of iRF can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.


Analytica Chimica Acta | 2014

A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration

Yong-Huan Yun; Wei-Ting Wang; Min-Li Tan; Yi-Zeng Liang; Hong-Dong Li; Dong-Sheng Cao; Hongmei Lu; Qing-Song Xu

Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list.


Metabolomics | 2010

Identification of free fatty acids profiling of type 2 diabetes mellitus and exploring possible biomarkers by GC-MS coupled with chemometrics

Bin-Bin Tan; Yizeng Liang; Lunzhao Yi; Hong-Dong Li; Zhiguang Zhou; Xiaoyan Ji; Jiahui Deng

Free fatty acids (FFAs), which are considered to be closely related with type 2 diabetes mellitus (T2DM), are not only the main energy source as nutrients, but also signaling molecules in insulin secretion. In this study, gas chromatography–mass spectrometry (GC–MS) coupled with two chemometric resolution methods, heuristic evolving latent projections (HELP) and selective ion analysis (SIA), was successfully applied to investigate plasma FFAs profiling of T2DM. Totally, twenty-three FFAs were identified and quantified. The results showed that HELP and SIA methods could be used to effectively handle overlapping peaks of GC–MS data and hence improve the qualitative and quantitative accuracy. Furthermore, a newly proposed competitive adaptive reweighted sampling (CARS) method coupled with partial least squares linear discriminant analysis (PLS-LDA) was introduced to seek the potential biomarkers. Finally, three fatty acids, oleic acid (OLA C18:1n-9), α-linolenic acid (ALA C18:3n-3), and eicosapentaenoic acid (EPA C20:5n-3), were identified as the potential biomarkers of T2DM for their powerful discriminant ability of T2DM patients from healthy controls. The study indicated that GC–MS combining with chemometric methods was a useful strategy to analyze metabolites and further screen the potential biomarkers of T2DM.


Journal of Pharmaceutical and Biomedical Analysis | 2010

Plasma metabolic fingerprinting of childhood obesity by GC/MS in conjunction with multivariate statistical analysis

Mao-Mao Zeng; Yi-Zeng Liang; Hong-Dong Li; Mei Wang; Bing Wang; Xian Chen; Neng Zhou; Dong-Sheng Cao; Jing Wu

Metabolic fingerprinting is a powerful tool for exploring systemic metabolic perturbations and potential biomarkers, thus may shed light on the pathophysiological mechanism of diseases. In this work, a new strategy of metabolic fingerprinting was proposed to exploit the disturbances of metabolic patterns and biomarker candidates of childhood obesity. Plasma samples from children with normal weight, overweight and obesity were first profiled by GC/MS. ULDA (uncorrelated linear discriminant analysis) then revealed that the metabolic patterns of the three groups were different. Furthermore, several metabolites, say isoleucine, glyceric acid, serine, 2,3,4-trihydroxybutyric acid and phenylalanine were screened as potential biomarkers of childhood obesity by both ULDA and CCA (canonical correlation analysis). CCA also shows satisfactory correlation between the metabolic patterns and clinical parameters, and the results further suggest that WHR (waist-hip ratio) together with TG (total triglycerides), TC (total cholesterol), HDL (high density lipoprotein) and LDL (low density lipoprotein) were the most important parameters which are associated closely with the metabolic perturbations of childhood obesity, so as to be paid more attention for dealing with metabolic disturbances of childhood obesity in clinical practice rather than regularly monitored BMI (body-mass index). The results have demonstrated that the proposed metabolic fingerprinting approach may be a useful tool for discovering metabolic abnormalities and possible biomarkers for childhood obesity.


IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2011

Recipe for Uncovering Predictive Genes Using Support Vector Machines Based on Model Population Analysis

Hong-Dong Li; Yi-Zeng Liang; Qing-Song Xu; Dong-Sheng Cao; Bin-Bin Tan; Baichuan Deng; Chen-Chen Lin

Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.eom/p/mia2009/).


Journal of Chemometrics | 2010

Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machine

Dong-Sheng Cao; Qing-Song Xu; Yi-Zeng Liang; Xian Chen; Hong-Dong Li

Aqueous solubility of drug compounds plays a very important role in drug research and development. In this study, we have collected 225 diverse druglike molecules with accurate aqueous solubility. Three commonly used methods, namely partial least squares (PLS), back‐propagation network (BPN) and support vector regression (SVR), were employed to model quantitative structure–property relationship (QSPR) for the aqueous solubility of 180 druglike compounds. Twenty eight molecular descriptors were used to relate the drug aqueous solubility. In order to obtain a reliable and robust aqueous solubility prediction, a novel outlier detection method was employed to simultaneously detect all outliers in the established models. According to the Organization for Economic Co‐operation and Development (OECD) principles, the QSPR models were checked by both internal and external statistical validation to ensure both reliability and predictive ability. The results indicate that three models can provide good predictive ability for drug aqueous solubility. Futhermore, it was found that the predictive ability of SVR is superior to those of PLS and BPN and 28 selected molecular descriptors could give a reliable and direct interpretation to the aqueous solubility. Copyright


Analyst | 2013

A perspective demonstration on the importance of variable selection in inverse calibration for complex analytical systems

Yong-Huan Yun; Yi-Zeng Liang; Gui-Xiang Xie; Hong-Dong Li; Dong-Sheng Cao; Qing-Song Xu

Classical calibration and inverse calibration are two kinds of multivariate calibration in chemical modeling. They use strategies of modeling in component spectral space and in measured variable space, respectively. However, the intrinsic difference between these two calibration models is not fully investigated. Besides, in the case of complex analytical systems, the net analyte signal (NAS) cannot be well defined in inverse calibration due to the existence of uninformative and/or interfering variables. Therefore, application of the NAS cannot improve the predictive performance for this kind of calibration, since it is essentially a technique based on the full-spectrum. From our perspective, variable selection can significantly improve the predictive performance through removing uninformative and/or interfering variables. Although the need for variable selection in the inverse calibration model has already been experimentally demonstrated, it has not aroused so much attention. In this study, we first clarify the intrinsic difference between these two calibration models and then use a new perspective to intrinsically prove the importance of variable selection in the inverse calibration model for complex analytical systems. In addition, we have experimentally validated our viewpoint through the use of one UV dataset and two generated near infrared (NIR) datasets.

Collaboration


Dive into the Hong-Dong Li's collaboration.

Top Co-Authors

Avatar

Yi-Zeng Liang

Central South University

View shared research outputs
Top Co-Authors

Avatar

Qing-Song Xu

Central South University

View shared research outputs
Top Co-Authors

Avatar

Dong-Sheng Cao

Central South University

View shared research outputs
Top Co-Authors

Avatar

Mao-Mao Zeng

Central South University

View shared research outputs
Top Co-Authors

Avatar

Xian Chen

Central South University

View shared research outputs
Top Co-Authors

Avatar

Bing Wang

Central South University

View shared research outputs
Top Co-Authors

Avatar

Liang-Xiao Zhang

Dalian Institute of Chemical Physics

View shared research outputs
Top Co-Authors

Avatar

Wei Fan

Central South University

View shared research outputs
Top Co-Authors

Avatar

Lunzhao Yi

Kunming University of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Yong-Huan Yun

Central South University

View shared research outputs
Researchain Logo
Decentralizing Knowledge