Huawen Liu
Zhejiang Normal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Huawen Liu.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2016
Thuc Duy Le; Tao Hoang; Jiuyong Li; Lin Liu; Huawen Liu; Shu Hu
Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g., gene expression datasets. On another note, the advancement of computer hardware in the last decade has resulted in the widespread availability of multi-core personal computers. There is a significant motivation for designing a parallelized PC algorithm that is suitable for personal computers and does not require end users’ parallel computing knowledge beyond their competency in using the PC algorithm. In this paper, we develop parallel-PC, a fast and memory efficient PC algorithm using the parallel computing technique. We apply our method to a range of synthetic and real-world high dimensional datasets. Experimental results on a dataset from the DREAM 5 challenge show that the original PC algorithm could not produce any results after running more than 24 hours; meanwhile, our parallel-PC algorithm managed to finish within around 12 hours with a 4-core CPU computer, and less than six hours with a 8-core CPU computer. Furthermore, we integrate parallel-PC into a causal inference method for inferring miRNA-mRNA regulatory relationships. The experimental results show that parallel-PC helps improve both the efficiency and accuracy of the causal inference algorithm.
PLOS ONE | 2015
Thuc Duy Le; Lin Liu; Huawen Liu; Jiuyong Li
microRNAs (miRNAs) are important gene regulators at post-transcriptional level, and inferring miRNA-mRNA regulatory relationships is a crucial problem. Consequently, several computational methods of predicting miRNA targets have been proposed using expression data with or without sequence based miRNA target information. A typical procedure for applying and evaluating such a method is i) collecting matched miRNA and mRNA expression profiles in a specific condition, e.g. a cancer dataset from The Cancer Genome Atlas (TCGA), ii) applying the new computational method to the selected dataset, iii) validating the predictions against knowledge from literature and third-party databases, and comparing the performance of the method with some existing methods. This procedure is time consuming given the time elapsed when collecting and processing data, repeating the work from existing methods, searching for knowledge from literature and third-party databases to validate the results, and comparing the results from different methods. The time consuming procedure prevents researchers from quickly testing new computational models, analysing new datasets, and selecting suitable methods for assisting with the experiment design. Here, we present an R package, miRLAB, for automating the procedure of inferring and validating miRNA-mRNA regulatory relationships. The package provides a complete set of pipelines for testing new methods and analysing new datasets. miRLAB includes a pipeline to obtain matched miRNA and mRNA expression datasets directly from TCGA, 12 benchmark computational methods for inferring miRNA-mRNA regulatory relationships, the functions for validating the predictions using experimentally validated miRNA target data and miRNA perturbation data, and the tools for comparing the results from different computational methods.
advanced data mining and applications | 2014
Ling Li; Huawen Liu; Zongjie Ma; Yuchang Mo; Zhengjie Duan; Jiaqing Zhou; Jianmin Zhao
Multi-label classification has gained extensive attention recently. Compared with traditional classification, multi-label classification allows one instance to associate with multiple labels. The curse of dimensionality existing in multi-label data presents a challenge to the performance of multi-label classifiers. Multi-label feature selection is a powerful tool for high-dimension problem. However, the existing feature selection methods are unable to take both computational complexity and label correlation into consideration. To address this problem, a new approach based on information gain for multi-label feather selection (IGMF) is presented in this paper. In the process of IGMF, Information gain between a feature and label set is exploited to measure the importance of the feature and label corrections. After that, the optimal feature subset are obtained by setting the threshold value. A series of experimental results show that IGMF can promote performance of multi-label classifiers.
International Journal of Machine Learning and Cybernetics | 2018
Huawen Liu; Zongjie Ma; Jianmin Han; Zhongyu Chen; Zhonglong Zheng
AbstractIn reality, data objects often belong to several different categories simultaneously, which are semantically correlated to each other. Multi-label learning can handle and extract useful information from such kind of data effectively. Since it has a great variety of potential applications, multi-label learning has attracted widespread attention from many domains. However, two major challenges still remain for multi-label learning: high dimensionality and correlations of data. In this paper, we address the problems by using the technique of partial least squares (PLS) and propose a new multi-label learning method called rPLSML (regularized Partial Least Squares for Multi-label Learning). Specifically, we exploit PLS discriminant analysis to identify a latent and common space from the variable and label spaces of data, and then construct a learning model based on the latent space. To tackle the multi-collinearity problem raised from the high dimensionality, a
IEEE Transactions on Multimedia | 2017
Huawen Liu; Lin Liu; Thuc Duy Le; Ivan Lee; Shiliang Sun; Jiuyong Li
IEEE Transactions on Systems, Man, and Cybernetics | 2017
Huawen Liu; Xuelong Li; Shichao Zhang
\ell _2
IEEE Transactions on Systems, Man, and Cybernetics | 2017
Huawen Liu; Xuelong Li; Jiuyong Li; Shichao Zhang
pacific-asia conference on knowledge discovery and data mining | 2013
Huawen Liu; Jiuyong Li; Lin Liu; Jixue Liu; Ivan Lee; Jianmin Zhao
ℓ2-norm penalty is further exerted on the optimization problem. The experimental results on public data sets show that rPLSML has better performance than the state-of-the-art multi-label learning algorithms.
Mathematical Problems in Engineering | 2013
Huawen Liu; Zhonglong Zheng; Jianmin Zhao; Ronghua Ye
Cross-view data are collected from two different views or sources about the same subjects. As the information from these views often consolidate and/or complement each other, cross-view data analysis can gain more insights for decision making. A main challenge of cross-view data analysis is how to effectively explore the inherently correlated and high-dimensional data. Dimension reduction offers an effective solution for this problem. However, how to choose right models and parameters involved for dimension reduction is still an open problem. In this paper, we propose an effective sparse learning algorithm for cross-view dimensionality reduction. A distinguished character of our model selection is that it is nonparametric and automatic. Specifically, we represent the correlation of cross-view data using a covariance matrix. Then, we decompose the matrix into a sequence of low-rank ones by solving an optimization problem in an alternating least squares manner. More importantly, a new and nonparametric sparsity-inducing function is developed to derive a parsimonious model. Extensive experiments are conducted on real-world data sets to evaluate the effectiveness of the proposed algorithm. The results show that our method is competitive with the state-of-the-art sparse learning algorithms.
international conference on pattern recognition | 2016
Minqi Mao; Zhonglong Zheng; Zhongyu Chen; Huawen Liu; Xiaowei He; Ronghua Ye
Multilabel learning has a wide range of potential applications in reality. It attracts a great deal of attention during the past years and has been extensively studied in many fields including image annotation and text categorization. Although many efforts have been made for multilabel learning, there are two challenging issues remaining, i.e., how to exploit the correlations and how to tackle the high-dimensional problems of multilabel data. In this paper, an effective algorithm is developed for multilabel classification with utilizing those data that are relevant to the targets. The key is the construction of a coefficient-based mapping between training and test instances, where the mapping relationship exploits the correlations among the instances, rather than the explicit relationship between the variables and the class labels of data. Further, a constraint, ℓ¹-norm penalty, is performed on the mapping relationship to make the model sparse, weakening the impacts of noisy data. Our empirical study on eight public datasets shows that the proposed method is more effective in comparing with the state-of-the-art multilabel classifiers.