Jun Zhang
Anhui University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jun Zhang.
Analytical Chemistry | 2011
Xiaoli Wei; Wenlong Sun; Xue Shi; Imhoi Koo; Bing Wang; Jun Zhang; Xinmin Yin; Yunan Tang; Bogdan Bogdanov; Seongho Kim; Zhanxiang Zhou; Craig J. McClain; Xiang Zhang
Data analysis in metabolomics is currently a major challenge, particularly when large sample sets are analyzed. Herein, we present a novel computational platform entitled MetSign for high-resolution mass spectrometry-based metabolomics. By converting the instrument raw data into mzXML format as its input data, MetSign provides a suite of bioinformatics tools to perform raw data deconvolution, metabolite putative assignment, peak list alignment, normalization, statistical significance tests, unsupervised pattern recognition, and time course analysis. MetSign uses a modular design and an interactive visual data mining approach to enable efficient extraction of useful patterns from data sets. Analysis steps, designed as containers, are presented with a wizard for the user to follow analyses. Each analysis step might contain multiple analysis procedures and/or methods and serves as a pausing point where users can interact with the system to review the results, to shape the next steps, and to return to previous steps to repeat them with different methods or parameter settings. Analysis of metabolite extract of mouse liver with spiked-in acid standards shows that MetSign outperforms the existing publically available software packages. MetSign has also been successfully applied to investigate the regulation and time course trajectory of metabolites in hepatic liver.
Journal of Chromatography A | 2011
Yaping Zhao; Jun Zhang; Bing Wang; Seong Ho Kim; Aiqin Fang; Bogdan Bogdanov; Zhanxiang Zhou; Craig J. McClain; Xiang Zhang
A method was developed to calculate the second dimension retention index of comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GC×GC/TOF-MS) data using n-alkanes as reference compounds. The retention times of the C(7)-C(31) alkanes acquired during 24 isothermal experiments cover the 0-6s retention time area in the second dimension retention time space, which makes it possible to calculate the retention indices of target compounds from the corresponding retention time values without the extension of the retention space of the reference compounds. An empirical function was proposed to show the relationship among the second dimension retention time, the temperature of the second dimension column, and the carbon number of the n-alkanes. The proposed function is able to extend the second dimension retention time beyond the reference n-alkanes by increasing the carbon number. The extension of carbon numbers in reference n-alkanes up to two more carbon atoms introduces <10 retention index units (iu) of deviation. The effectiveness of using the proposed method was demonstrated by analyzing a mixture of compound standards in temperature programmed experiments using 6 different initial column temperatures. The standard deviation of the calculated retention index values of the compound standards fluctuated from 1 to 12 iu with a mean standard deviation of 5 iu.
Journal of Chromatography A | 2012
Jun Zhang; Imhoi Koo; Bing Wang; Qingwei Gao; Chun-Hou Zheng; Xiang Zhang
Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.
Neurocomputing | 2017
Sen Xia; Peng Chen; Jun Zhang; Xiaoping Li; Bing Wang
A method for automatic image annotation based on multi-feature fusion and multi-label learning algorithm was proposed in this paper. In the process of feature fusion, rotation-invariant uniform local binary pattern histogram distribution and counting of connected regions in image were extracted and utilized fully. Besides traditional n-order color moments and texture information, rotation-invariant uniform LBP histogram distribution, connected regions number, weighted histograms integral were appended to image features which aided to improve the average precision. Based on multi-label learning k-nearest neighbor algorithm and Corel5k image data set, comparisons among different dimensional features combinations were made to show that the proposed method outperformed that of traditional one with only basic color moments and texture distribution. The average precision was showed to be improved from 0.2898 to 0.3954 in automatic image annotation in our experimental results.
IEEE Signal Processing Letters | 2012
Zhan-Li Sun; Chun-Hou Zheng; Qingwei Gao; Jun Zhang; De-Xiang Zhang
Eigengene extracted by independent component analysis (ICA) is one kind of effective feature for tumor classification. In this letter, a novel tumor classification approach is proposed by using eigengene and support vector machine (SVM) based classifier committee learning (CCL) algorithm. In this method, a strategy of random feature subspace division is designed to improve the diversity of weaker classifiers. Gene expression data constructed by different feature subspaces are modeled by ICA, respectively. And the corresponding eigengene sets extracted by the ICA algorithm are used as the inputs of the weaker SVM classifiers. Moreover, a strategy of Bayesian sum rule (BSR) is designed to integrate the outputs of the weaker SVM classifiers, and used to provide a final decision for the tumor category. Experimental results on three DNA microarray datasets demonstrate that the proposed method is effective and feasible for tumor classification.
Neurocomputing | 2017
Jun Zhang; Chun-Hou Zheng; Yi Xia; Bing Wang; Peng Chen
A new method using genetic algorithm and support vector regression with parameter optimization (GASVRPO) was developed for the prediction of compound retention indices (RI) in gas chromatography. The dataset used in this work consists of 252 compounds extracted from the Molecular Operating Environment (MOE) boiling point database. Molecular descriptors were calculated by descriptor tools of the MOE software package. After removing redundant descriptors, 151 descriptors were obtained for each compound. A genetic algorithm (GA) was used to select the best subset of molecular descriptors and the best parameters of SVR to optimize the prediction performance of compound retention indices. A 10-fold cross-validation method was used to evaluate the prediction performance. We compared the performance of our proposed model with three existing methods: GA coupled with multiple linear regression (GAMLR), the subset selected by GAMLR used to train SVR (GAMLRSVR), and GA on SVR (GASVR). The experimental results demonstrate that our proposed GASVRPO model has better predictive performance than other existing models with R2>0.967 and RMSE=49.94. The prediction accuracy of GASVRPO model is 96% at 10% of prediction variation.
Neurocomputing | 2017
Jun Zhang; Muchun Zhu; Peng Chen; Bing Wang
Abstract Drug-target interaction is key in drug discovery. Since the determination of drug-target interactions is costly and time-consuming by in vitro experiments, computational method is a complement to determine the interactions. To address the issue, a random projection ensemble approach is proposed. First, drug-compounds are encoded with feature descriptors by software “PaDEL-Descriptor”. Second, target proteins are encoded with physiochemical properties of amino acids, where the 34 relatively independent physiochemical properties are extracted from 544 properties in AAindex1 database. Random projection on the vector of drug-target pair with different dimensions can project the original space onto a reduced one and thus yield a transformed vector with a fixed dimension. Several random projections build an ensemble REPTree system. Experimental results show that our method significantly outperforms and runs faster than other state-of-the-art drug-target predictors, on the commonly used drug-target benchmark sets.
Journal of Chromatography A | 2012
Imhoi Koo; Yaping Zhao; Jun Zhang; Seongho Kim; Xiang Zhang
A method of calculating the second dimension hold-up time for comprehensive two-dimensional gas chromatographic (GC×GC) data was developed by incorporating the temperature information of the second dimension column into the calculation model. The model was developed by investigating the relationship between the coefficients in each of six literature reported nonlinear models and the relationship between each coefficient and the second dimension column temperature. The most robust nonlinear function was selected and further used to construct the new model for calculation of the second dimension retention time, in which the coefficients that have significant correlation with the column temperature are replaced with expressions of column temperature. An advantage of the proposed equation is that eight parameters could explain the second dimension hold-up time as well as retention time corresponding to n-alkanes and column temperature in the entire chromatographic region, including the chromatographic region not bounded by the retention times of n-alkanes. To optimize the experimental design for collecting the isothermal data of n-alkanes to create the second dimension hold-up time model, the column temperature difference and the number of isothermal experiments should be considered simultaneously. It was concluded that a total of 5 or 6 isothermal experiments with temperature difference of 40 or 50 °C are enough to generate an accurate model. The test mean squared error (MSE) of those conditions ranges from 0.0428 to 0.0532 for calculation of the second dimension hold-up time for GC×GC data.
international conference on intelligent computing | 2011
Bing Wang; Peng Chen; Jun Zhang
Protein-protein interactions play essential roles in protein function implementation. A computational model is introduced in this work for predicting protein interface residues based on amino acid chemicophysical properties only. 17 amino acid properties are selected from AAindex database and used as input features of a prediction model which is constructed by support vector machines method to infer protein interface residues in protein hetero-complexes. The results achieved in this work demonstrated the properties used in this work can actually capture up the difference between interface and noninterface residues.
chinese conference on biometric recognition | 2017
Yanlin Li; Dexiang Zhang; Jun Zhang; Lina Xun; Qing Yan; Jingjing Zhang; Qingwei Gao; Yi Xia
This paper proposed a novel gait recognition method that is based on plantar pressure images. Different from many conventional methods where hand-crafted features are extracted explicitly. We utilized Convolution Neural Network (CNN) for automatic feature extraction as well as classification. The peak pressure image (PPI) generated from the time series of plantar pressure images is used as the characteristic image for gait recognition in this study. Our gait samples are collected from 109 subjects under three kinds of walking speeds, and for each subject total 18 samples are gathered. Experimental results demonstrate that the designed CNN model can obtain very high classification accuracy as compared to many traditional methods.