Liang Du
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liang Du.
international conference on data mining | 2012
Liang Du; Xuan Li
Nonnegative matrix factorization (NMF) is a popular technique for learning parts-based representation and data clustering. It usually uses the squared residuals to quantify the quality of factorization, which is optimal specifically to zero-mean, Gaussian noise and sensitive to outliers in general cases. In this paper, we propose a robust NMF method based on the correntropy induced metric, which is much more insensitive to outliers. A half-quadratic optimization algorithm is developed to solve the proposed problem efficiently. The proposed method is further extended to handle outlier rows by incorporating structural knowledge about the outliers. Experimental results on data sets with and without apparent outliers demonstrate the effectiveness of the proposed algorithms.
knowledge discovery and data mining | 2015
Liang Du
The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods.
international conference on data mining | 2014
Lei Shi; Liang Du
In this paper, we consider the problem of unsupervised feature selection. Recently, spectral feature selection algorithms, which leverage both graph Laplacian and spectral regression, have received increasing attention. However, existing spectral feature selection algorithms suffer from two major problems: 1) since the graph Laplacian is constructed from the original feature space, noisy and irrelevant features may have adverse effect on the estimated graph Laplacian and hence degenerate the quality of the induced graph embedding, 2) since the cluster labels are discrete in natural, relaxing and approximating these labels into a continuous embedding can inevitably introduce noise into the estimated cluster labels. Without considering the noise in the cluster labels, the feature selection process may be misguided. In this paper, we propose a Robust Spectral learning framework for unsupervised Feature Selection (RSFS), which jointly improves the robustness of graph embedding and sparse spectral regression. Compared with existing methods which are sensitive to noisy features, our proposed method utilizes a robust local learning method to construct the graph Laplacian and a robust spectral regression method to handle the noise on the learned cluster labels. In order to solve the proposed optimization problem, an efficient iterative algorithm is proposed. We also show the close connection between the proposed robust spectral regression and robust Huber M-estimator. Experimental results on different datasets show the superiority of RSFS.
international conference on data mining | 2013
Liang Du; Zhiyong Shen; Xuan Li; Peng Zhou
In this paper, we consider the problem of feature selection in unsupervised learning scenario. Recently, spectral feature selection methods, which leverage both the graph Laplacian and the learning mechanism, have received considerable attention. However, when there are lots of irrelevant or noisy features, such graphs may not be reliable and then mislead the selection of features. In this paper, we propose the Local and Global Discriminative learning for unsupervised Feature Selection (LGDFS), which integrates a global and a set of locally linear regression model with weighted l2-norm regularization into a unified learning framework. By exploring the discriminative and geometrical information in the weighted feature space, which alleviates the effects of the irrelevant features, our approach can find the most representative features to well respect the cluster structure of the data. Experimental results on several benchmark data sets are provided to validate the effectiveness of the proposed approach.
advanced data mining and applications | 2011
Liang Du; Xuan Li
Item recommendation from implicit, positive only feedback is an emerging setup in collaborative filtering in which only one class examples are observed. In this paper, we propose a novel method, called User Graph regularized Pairwise Matrix Factorization (UGPMF), to seamlessly integrate user information into pairwise matrix factorization procedure. Due to the use of the available information on user side, we are able to find more compact, low dimensional representations for users and items. Experiments on real-world recommendation data sets demonstrate that the proposed method significantly outperforms various competing alternative methods on top-k ranking performance of one-class item recommendation task.
IEEE Transactions on Knowledge and Data Engineering | 2013
Xuan Li; Liang Du
Due to the fast evolution of the information on the Internet, update summarization has received much attention in recent years. It is to summarize an evolutionary document collection at current time supposing the users have read some related previous documents. In this paper, we propose a graph-ranking-based method. It performs constrained reinforcements on a sentence graph, which unifies previous and current documents, to determine the salience of the sentences. The constraints ensure that the most salient sentences in current documents are updates to previous documents. Since this method is NP-hard, we then propose its approximate method, which is polynomial time solvable. Experiments on the TAC 2008 and 2009 benchmark data sets show the effectiveness and efficiency of our method.
conference on information and knowledge management | 2010
Xuan Li; Liang Du; Chenyan Xiong
Novelty, coverage and balance are important requirements in topic-focused summarization, which to a large extent determine the quality of a summary. In this paper, we propose a novel method that incorporates these requirements into a sentence ranking probability model. It differs from the existing methods in that the novelty, coverage and balance requirements are all modeled w.r.t. a given topic, so that summaries are highly relevant to the topic and at the same time comply with topic-aware novelty, coverage and balance. Experimental results on the DUC 2005, 2006 and 2007 benchmark data sets demonstrate the effectiveness of our method.
Neurocomputing | 2015
Nannan Gu; Mingyu Fan; Liang Du; Dongchun Ren
Though Fisher score is a representative and effective feature selection method, it has an unsolved drawback: it either evaluates the features individually and selects the top features, or selects features using the sequential search strategies. The individual-method ignores the mutual relationship among the selected features while the sequential-methods always suffer from heavy computation. In this work, we present an efficient sequential feature selection method. In the proposed method, the generalized Fisher score is used as a robust measurement of the discriminative ability of the features, which can naturally deal with the Small Size Sample problem. Besides, each feature is considered as a pattern vector and an adaptive eigenspace model is applied to update the generalized Fisher score. In the proposed adaptive eigenspace model, the size of the eigen-decomposition problems does not increase with the number of selected features, but is determined by the dimension of the adaptive eignespace. If the dimension of the adaptive eigenspace model is fixed, the proposed algorithm approximately consumes constant time to evaluate a candidate feature. Therefore, the proposed method is computationally more efficient than the traditional sequential methods. Experiments on six widely used face databases are conducted to demonstrate the efficacy of the proposed approach.
international conference on data mining | 2014
Liang Wu; Liang Du; Bo Liu; Guandong Xu; Yong Ge; Yanjie Fu; Jianhui Li; Yuanchun Zhou; Hui Xiong
The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source codes and the semantic information hidden in the comments, descriptions, and identifiers of the source codes. Moreover, we design a heterogeneous metric learning approach, which allows to integrate code features and text features into the same latent semantic space. This, in turn, can help to measure the artifact similarity by exploiting the joint power of both code and text features. Finally, extensive experiments on real-world data show that the proposed method can help to improve the performances of software artifact retrieval with a significant margin.
web information systems engineering | 2013
Jun Deng; Liang Du
Due to the massive explosion of multimedia content on the web, users demand a new type of information retrieval, called cross-modal multimedia retrieval where users submit queries of one media type and get results of various other media types. Performing effective retrieval of heterogeneous multimedia content brings new challenges. One essential aspect of these challenges is to learn a heterogeneous metric between different types of multimedia objects. In this paper, we propose a Bayesian personalized ranking based heterogeneous metric learning (BPRHML) algorithm, which optimizes for correctly ranking the retrieval results. It uses pairwise preference constraints as training data and explicitly optimizes for preserving these constraints. To further encouraging the smoothness of learning results, we integrate graph regularization with Bayesian personalized ranking. The experimental results on two publicly available datasets show the effectiveness of our method.