Yong Liang
Macau University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yong Liang.
BMC Bioinformatics | 2013
Yong Liang; Cheng Liu; Xin-Ze Luan; Kwong-Sak Leung; Tak-Ming Chan; Zongben Xu; Hai Zhang
BackgroundMicroarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0 < q < 1) regularizations and has been demonstrated many attractive properties.ResultsIn this work, we investigate a sparse logistic regression with the L1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L1 and elastic net regularization approaches.ConclusionsFrom our evaluations, it is clear that the sparse logistic regression with the L1/2 penalty achieves higher classification accuracy than those of ordinary L1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L1/2 penalty is effective technique for gene selection in real classification problems.
Nature Communications | 2017
Jing-Rong Wang; Wei-Na Gao; Rudolf Grimm; Shibo Jiang; Yong Liang; Hua Ye; Zhan-Guo Li; Lee-Fong Yau; Hao Huang; Ju Liu; Min Jiang; Qiong Meng; Tian-Tian Tong; Hai-Hui Huang; Stephanie Lee; Xing Zeng; Liang Liu; Zhihong Jiang
N-linked glycans on immunoglobulin G (IgG) have been associated with pathogenesis of diseases and the therapeutic functions of antibody-based drugs; however, low-abundance species are difficult to detect. Here we show a glycomic approach to detect these species on human IgGs using a specialized microfluidic chip. We discover 20 sulfated and 4 acetylated N-glycans on IgGs. Using multiple reaction monitoring method, we precisely quantify these previously undetected low-abundance, trace and even ultra-trace N-glycans. From 277 patients with rheumatoid arthritis (RA) and 141 healthy individuals, we also identify N-glycan biomarkers for the classification of both rheumatoid factor (RF)-positive and negative RA patients, as well as anti-citrullinated protein antibodies (ACPA)-positive and negative RA patients. This approach may identify N-glycosylation-associated biomarkers for other autoimmune and infectious diseases and lead to the exploration of promising glycoforms for antibody therapeutics.Post-translational modifications can affect antibody function in health and disease, but identification of all variants is difficult using existing technologies. Here the authors develop a microfluidic method to identify and quantify low-abundance IgG N-glycans and show some of these IgGs can be used as biomarkers for rheumatoid arthritis.
Applied Soft Computing | 2014
Cheng Liu; Yong Liang; Xin-Ze Luan; Kwong-Sak Leung; Tak-Ming Chan; Zongben Xu; Hai Zhang
In this paper, we investigate to use the L1/2 regularization method for variable selection based on the Coxs proportional hazards model. The L1/2 regularization can be taken as a representative of Lq (0
PLOS ONE | 2016
Hai-Hui Huang; Xiao-Ying Liu; Yong Liang
Cancer classification and feature (gene) selection plays an important role in knowledge discovery in genomic data. Although logistic regression is one of the most popular classification methods, it does not induce feature selection. In this paper, we presented a new hybrid L1/2 +2 regularization (HLR) function, a linear combination of L1/2 and L2 penalties, to select the relevant gene in the logistic regression. The HLR approach inherits some fascinating characteristics from L1/2 (sparsity) and L2 (grouping effect where highly correlated variables are in or out a model together) penalties. We also proposed a novel univariate HLR thresholding approach to update the estimated coefficients and developed the coordinate descent algorithm for the HLR penalized logistic regression model. The empirical results and simulations indicate that the proposed method is highly competitive amongst several state-of-the-art methods.
Science in China Series F: Information Sciences | 2012
Hai Zhang; Yong Liang; HaiLiang Gou; Zongben Xu
We show the essential ability of sparse signal reconstruction of different compressive sensing strategies, which include the L1 regularization, the L0 regularization(thresholding iteration algorithm and OMP algorithm), the Lq(0 < q ⩽ 1) regularizations, the Log regularization and the SCAD regularization. Taking phase diagram as the basic tool for analysis, we find that (i) the solutions of the L0 regularization using hard thresh-olding algorithm and OMP algorithm are similar to those of the L1 regularization; (ii) the Lq regularization with the decreasing value of q, the Log regularization and the SCAD regularization can attain sparser solutions than the L1 regularization; (iii) the L1/2 regularization can be taken as a representative of the Lq(0 < q < 1) regularizations. When 1/2 < q < 1, the L1/2 regularization always yields the sparsest solutions and when 0 < q < 1/2 the performance of the regularizations takes no significant difference. The results of this paper provide experimental evidence for our previous work.
BMC Medical Genomics | 2016
Yong Liang; Hua Chai; Xiao-Ying Liu; Zongben Xu; Hai Zhang; Kwong-Sak Leung
BackgroundOne of the most important objectives of the clinical cancer research is to diagnose cancer more accurately based on the patients’ gene expression profiles. Both Cox proportional hazards model (Cox) and accelerated failure time model (AFT) have been widely adopted to the high risk and low risk classification or survival time prediction for the patients’ clinical treatment. Nevertheless, two main dilemmas limit the accuracy of these prediction methods. One is that the small sample size and censored data remain a bottleneck for training robust and accurate Cox classification model. In addition to that, similar phenotype tumours and prognoses are actually completely different diseases at the genotype and molecular level. Thus, the utility of the AFT model for the survival time prediction is limited when such biological differences of the diseases have not been previously identified.MethodsTo try to overcome these two main dilemmas, we proposed a novel semi-supervised learning method based on the Cox and AFT models to accurately predict the treatment risk and the survival time of the patients. Moreover, we adopted the efficient L1/2 regularization approach in the semi-supervised learning method to select the relevant genes, which are significantly associated with the disease.ResultsThe results of the simulation experiments show that the semi-supervised learning model can significant improve the predictive performance of Cox and AFT models in survival analysis. The proposed procedures have been successfully applied to four real microarray gene expression and artificial evaluation datasets.ConclusionsThe advantages of our proposed semi-supervised learning method include: 1) significantly increase the available training samples from censored data; 2) high capability for identifying the survival risk classes of patient in Cox model; 3) high predictive accuracy for patients’ survival time in AFT model; 4) strong capability of the relevant biomarker selection. Consequently, our proposed semi-supervised learning model is one more appropriate tool for survival analysis in clinical cancer research.
Computers in Biology and Medicine | 2015
Hua Chai; Yong Liang; Xiao-Ying Liu
The analysis of high-dimensional and low-sample size microarray data for survival analysis of cancer patients is an important problem. It is a huge challenge to select the significantly relevant bio-marks from microarray gene expression datasets, in which the number of genes is far more than the size of samples. In this article, we develop a robust prediction approach for survival time of patient by a L(1/2) regularization estimator with the accelerated failure time (AFT) model. The L(1/2) regularization could be seen as a typical delegate of L(q)(0<q<1) regularization methods and it has shown many attractive features. In order to optimize the problem of the relevant gene selection in high-dimensional biological data, we implemented the L(1/2) regularized AFT model by the coordinate descent algorithm with a renewed half thresholding operator. The results of the simulation experiment showed that we could obtain more accurate and sparse predictor for survival analysis by the L(1/2) regularized AFT model compared with other L1 type regularization methods. The proposed procedures are applied to five real DNA microarray datasets to efficiently predict the survival time of patient based on a set of clinical prognostic factors and gene signatures.
The Scientific World Journal | 2013
Xiao-Ying Liu; Yong Liang; Zongben Xu; Hai Zhang; Kwong-Sak Leung
A new adaptive L1/2 shooting regularization method for variable selection based on the Coxs proportional hazards mode being proposed. This adaptive L1/2 shooting algorithm can be easily obtained by the optimization of a reweighed iterative series of L1 penalties and a shooting strategy of L1/2 penalty. Simulation results based on high dimensional artificial data show that the adaptive L1/2 shooting regularization method can be more accurate for variable selection than Lasso and adaptive Lasso methods. The results from real gene expression dataset (DLBCL) also indicate that the L1/2 regularization method performs competitively.
BioMed Research International | 2015
Hai-Hui Huang; Yong Liang; Xiao-Ying Liu
Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on L 1-norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced L 1/2 penalized solver to penalize network-constrained logistic regression model called an enhanced L 1/2 net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms L 1 regularization, the old L 1/2 penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than L 1 regularization, the old L 1/2 penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected.
Bio-medical Materials and Engineering | 2015
Hai-Hui Huang; Xiao-Ying Liu; Yong Liang; Hua Chai; Liang-Yong Xia
Tuberculosis (TB), caused by infection with mycobacterium tuberculosis, is still a major threat to human health worldwide. Current diagnostic methods encounter some limitations, such as sample collection problem or unsatisfied sensitivity and specificity issue. Moreover, it is hard to identify TB from some of other lung diseases without invasive biopsy. In this paper, the logistic models with three representative regularization approaches including Lasso (the most popular regularization method), and L1/2 (the method that inclines to achieve more sparse solution than Lasso) and Elastic Net (the method that encourages a grouping effect of genes in the results) adopted together to select the common gene signatures in microarray data of peripheral blood cells. As the result, 13 common gene signatures were selected, and sequentially the classifier based on them is constructed by the SVM approach, which can accurately distinguish tuberculosis from other pulmonary diseases and healthy controls. In the test and validation datasets of the blood gene expression profiles, the generated classification model achieved 91.86% sensitivity and 93.48% specificity averagely. Its sensitivity is improved 6%, but only 26% gene signatures used compared to recent research results. These 13 gene signatures selected by our methods can be used as the basis of a blood-based test for the detection of TB from other pulmonary diseases and healthy controls.