Liang-Xiao Zhang
Dalian Institute of Chemical Physics
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liang-Xiao Zhang.
Analytica Chimica Acta | 2011
Dong-Sheng Cao; Qian-Nan Hu; Qing-Song Xu; Yan-Ning Yang; Jian-Chao Zhao; Hongmei Lu; Liang-Xiao Zhang; Yi-Zeng Liang
A modified random forest (RF) algorithm, as a novel machine learning technique, was developed to estimate the maximum recommended daily dose (MRDD) of a large and diverse pharmaceutical dataset for phase I human trials using substructure fingerprint descriptors calculated from simple molecular structure alone. This type of novel molecular descriptors encodes molecular structure in a series of binary bits that represent the presence or absence of particular substructures in the molecule and thereby can accurately and directly depict a series of local information hidden in this molecule. Two model validation approaches, 5-fold cross-validation and an independent validation set, were used for assessing the prediction capability of our models. The results obtained in this study indicate that the modified RF gave prediction accuracy of 80.45%, sensitivity of 75.08%, specificity of 84.85% for 5-fold cross-validation, and prediction accuracy of 80.5%, sensitivity of 76.47%, specificity of 83.48% for independent validation set, respectively, which are as a whole better than those by the original RF. At the same time, the important substructure fingerprints, recognized by the RF technique, gave some insights into the structure features related to toxicity of pharmaceuticals. This could help provide intuitive understanding for medicinal chemists.
Analytica Chimica Acta | 2011
Dong-Sheng Cao; Mao-Mao Zeng; Lunzhao Yi; Bing Wang; Qing-Song Xu; Qian-Nan Hu; Liang-Xiao Zhang; Hongmei Lu; Yi-Zeng Liang
Large amounts of data from high-throughput metabolomics experiments become commonly more and more complex, which brings an enormous amount of challenges to existing statistical modeling. Thus there is a need to develop statistically efficient approach for mining the underlying metabolite information contained by metabolomics data under investigation. In the work, we developed a novel kernel Fisher discriminant analysis (KFDA) algorithm by constructing an informative kernel based on decision tree ensemble. The constructed kernel can effectively encode the similarities of metabolomics samples between informative metabolites/biomarkers in specific parts of the measurement space. Simultaneously, informative metabolites or potential biomarkers can be successfully discovered by variable importance ranking in the process of building kernel. Moreover, KFDA can also deal with nonlinear relationship in the metabolomics data by such a kernel to some extent. Finally, two real metabolomics datasets together with a simulated data were used to demonstrate the performance of the proposed approach through the comparison of different approaches.
Journal of Chromatography A | 2012
Jun Yan; Dong-Sheng Cao; Fang-Qiu Guo; Liang-Xiao Zhang; Min He; Jian-Hua Huang; Qing-Song Xu; Yi-Zeng Liang
A quantitative structure-retention relationship study was performed for 656 flavor compounds with highly structural diversity on four stationary phases of different polarities, using topological, constitutional, quantum chemical and geometrical descriptors. Statistical methods were employed to find an informative subset that can accurately predict the gas chromatographic retention indices (RIs). Multivariable linear regression (MLR) was used to map the descriptors to the RIs. The stability and validity of models have been tested by internal and external validation, and good stability and predictive ability were obtained. The resulting QSRR models were well-correlated, with the square of correlation coefficients for cross validation, Q², values of 0.9595, 0.9528, 0.9595 and 0.9223 on stationary phase OV101, DB5, OV17 and C20M, respectively. The molecular properties known to be relevant for GC retention index, such as molecular size, branching, electron density distribution and hydrogen bond effect were well covered by generated descriptors. The descriptors used in models on four stationary phases were compared, and some reasonable explanations about gas chromatographic retention mechanism were obtained. The model may be useful for the prediction of flavor compounds while experimental data is unavailable.
Chromatographia | 2001
Jiping Chen; Xinmiao Liang; Zhang Q; Liang-Xiao Zhang
SummaryA numerical approach has been developed for the correlation of retention times (total retention time) with temperature in gas chromatography, which allows the calculation of retention parameters including retention index from data acquired under two or more different temperature program conditions. By using this procedure the optimization of temperature condition can be further achieved, especially when a temperature-programmed run is the most suitable mode in the preliminary development of an analytical method for the analysis of an unknown sample.
Talanta | 2012
Liang-Xiao Zhang; Binbin Tan; Maomao Zeng; Hongmei Lu; Yi-Zeng Liang
Gas chromatography mass spectrometry (GC-MS) is routinely employed to analyze small molecules in various samples. The more challenge of GC-MS data processing is to identify the unknown compounds in samples. Mass spectra and retention indices library searching are commonly used method. However, the current libraries are often built through collecting data from different groups. To unknown compounds with similar mass spectra and retention indices (e.g. geometric (cis/trans) isomers), the inaccurate results sometime are supplied. In this case, the costly standard compounds have to be used in every analysis. In this report, taking identification of fatty acids as an example, we proposed a strategy of establishment of special database constructed by equivalent chain length (ECL) values in uniform conditions and mass spectra of fatty acid methyl esters (FAMEs). The mass spectral characteristics were firstly used to identify all expected straight saturated fatty acids, and subsequently calculate the ECL for fatty acids in the sample. Finally, the ECL values of fatty acids in the sample were compared with those of fatty acids in the customized database to identify their structures. The results showed that the method developed in this report could effectively identify similar unknown compounds (FAMEs in the human plasma) after validated by the authentic standards.
Journal of Chromatography A | 2010
Liang-Xiao Zhang; Yifeng Yun; Yi-Zeng Liang; Dong-Sheng Cao
The mass spectral characteristics of wax esters were systemically summarized and interpreted through data mining of their standard mass spectra taken from NIST standard mass spectral library. Combining with the rules of retention indices described in the previous study, an automatic system was subsequently developed to identify the structural information for wax esters from GC/MS data. After tested and illustrated by both simulated and real GC/MS data, the results indicate that this system could identify wax esters except the polyunsaturated ones and the mass spectral characteristics are useful and effective information for identification of wax esters.
Journal of Chemometrics | 2011
Dong-Sheng Cao; Yi-Zeng Liang; Qing-Song Xu; Liang-Xiao Zhang; Qian-Nan Hu; Hong-Dong Li
Good performance of ensemble approaches could generally be obtained when base classifiers are diverse and accurate. In the present study, feature importance sampling‐based adaptive random forest (fisaRF) was proposed to obtain superior classification performance to the primal one‐step random forest (RF). fisaRF takes a convenient, yet very effective, way called feature importance sampling (FIS), to select the more eligible feature subset at each splitting node instead of simple random sampling and thereby strengthen the accuracy of individual trees, without sacrificing diversity between them. Additionally, the iterative use of feature importance obtained by the previous step can adaptively capture the most significant features in data and effectively deal with multiple classification problems, not easily solved by other feature importance indexes. The proposed fisaRF was applied to classify three structure–activity relationship (SAR) data sets proposed by Xue et al. 1 together with disinfection by‐products (DBPs) data, compared to the primal one‐step RF induced by simple random sampling. The comparison revealed that fisaRF can effectively improve the classification accuracy and prediction confidence for each sample and thereby was considered as a very useful tool to screen the underlying lead compounds. Copyright
Analyst | 2009
Liang-Xiao Zhang; Yi-Zeng Liang; Aiming Chen
Gas chromatography-mass spectrometry (GC-MS) is widely used in many fields because of its high sensitivity, high resolution and reproducibility. The major challenge of this analytical technology is the identification of components in complex samples. Generally, mass spectral library searching is commonly employed to assist in the identification of unknown spectra. However, this widely available method just provides a hit-list of candidates ordered by their numerical similarity indices. When an unknown compound has many isomeric compounds or is absent from the reference library, this approach might be less useful. Classification of mass spectra, a complementary technique to the library searching, is beneficial to computer-aided mass spectral interpretation but suffers from the fact that the variables used in the classifier are usually uninterpretable. In this study, a novel classifier is built based on data mining and feature analysis. In this classifier, the neutral loss is skillfully used to identify the differences between mass spectra of alcohols and ethers in the data set. After comparison with two chemometric methods, Fisher ratios linear discriminant analysis (LDA) and genetic algorithm partial least squares discriminant (GA-DPLS) analysis, it is found that our method achieves a better predictive ability. More importantly, this method is able to predict whether compounds could be classified correctly or not.
Chromatographia | 2012
Wan Zhang; Liang-Xiao Zhang; Hong-Dong Li; Yi-Zeng Liang; Rong Hu; Nannan Liang; Wei Fan; Dong-Sheng Cao; Lunzhao Yi; Jidong Xia
Postoperative cognitive dysfunction (POCD) is a subtle cognitive dysfunction, especially memory impairment for weeks or months after surgery. The underlying pathophysiological mechanism of POCD is still unclear. The aim of this study was to exploratively investigate the potential mechanism of POCD by identifying the differences among metabolic profiles of control rats, POCD and no-POCD rats after isoflurane anesthesia based on GC–MS, and subsequently discovering POCD biomarkers. In this paper, a feature-variable selection method, subwindow permutation analysis (SPA), was employed to seek the key metabolites distinguishing POCD from control group, POCD from no-POCD group. Fortunately, two key metabolites, hexadecanoic acid and myo-Inositol, were both screened out for discriminating POCD and control, POCD and no-POCD rats. It suggested that they may reveal the disturbances between POCD and control, POCD and no-POCD rats, which may be the potential biomarkers of POCD. Furthermore, related possible pathogenesis was taken into account on the basis of the relevant literatures and pathway databases. It suggested that POCD was probably related to disturbed hexadecanoic acid metabolism and myo-Inositol metabolism. All the results demonstrated that the proposed metabolic profiling approach and SPA method may be effective for exploring metabolic perturbations and possible biomarkers for POCD.
Chromatographia | 2001
Jiping Chen; Xinmiao Liang; Zhang Q; Liang-Xiao Zhang
SummaryAn empirical equation is proposed to accurately correlate isothermal data over a wide range of temperature. With the equation lnk=A*+B*/Tλ the retention times of different solutes tested on OV-101, SE-54 and PEG 20M capillary columns have been achieved even when λ is assigned a constant value of 1.7. Comparison with lnk=A+B/T and lnk=c+d/T+h/T2, shows that the proposed equation is of higher accuracy and is applicable to extrapolation calculation, especially from data at high temperature to those at low temperature. ParametersA* andB* as well asA andB are also discussed. The linear correlation ofA* andB* is weaker than that ofA andB.