Liping Jing
Beijing Jiaotong University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Liping Jing.
Knowledge and Information Systems | 2010
Liping Jing; Michael K. Ng; Joshua Zhexue Huang
This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM.
IEEE Transactions on Image Processing | 2012
Liping Jing; Chao Zhang; Michael K. Ng
In this paper, we propose a novel supervised nonnegative matrix factorization-based framework for both image classification and annotation. The framework consists of two phases: training and prediction. In the training phase, two supervised nonnegative matrix factorizations for image descriptors and annotation terms are combined to identify the latent image bases, and to represent the training images in the bases space. These latent bases can capture the representation of the images in terms of both descriptors and annotation terms. Based on the new representation of training images, classifiers can be learnt and built. In the prediction phase, a test image is first represented by the latent bases via solving a linear least squares problem, and then its class label and annotation can be predicted via the trained classifiers and the proposed annotation mapping model. In the algorithm, we develop a three-block proximal alternating nonnegative least squares algorithm to determine the latent image bases, and show its convergent property. Extensive experiments on real-world image data sets suggest that the proposed framework is able to predict the label and annotation for testing images successfully. Experimental results have also shown that our algorithm is computationally efficient and effective for image classification and annotation.
computer vision and pattern recognition | 2015
Liping Jing; Liu Yang; Jian Yu; Michael K. Ng
Multi-label problems arise in various domains including automatic multimedia data categorization, and have generated significant interest in computer vision and machine learning community. However, existing methods do not adequately address two key challenges: exploiting correlations between labels and making up for the lack of labeled data or even missing labels. In this paper, we proposed a semi-supervised low-rank mapping (SLRM) model to handle these two challenges. SLRM model takes advantage of the nuclear norm regularization on mapping to effectively capture the label correlations. Meanwhile, it introduces manifold regularizer on mapping to capture the intrinsic structure among data, which provides a good way to reduce the required labeled data with improving the classification performance. Furthermore, we designed an efficient algorithm to solve SLRM model based on alternating direction method of multipliers and thus it can efficiently deal with large-scale datasets. Experiments on four real-world multimedia datasets demonstrate that the proposed method can exploit the label correlations and obtain promising and better label prediction results than state-of-the-art methods.
Expert Systems With Applications | 2012
Jiali Yun; Liping Jing; Jian Yu; Houkuan Huang
Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term+Concept VSM) plus existing classification methods.
International Journal of Granular Computing, Rough Sets and Intelligent Systems | 2009
Michael K. Ng; Liping Jing
This correspondence describes extensions to the fuzzy k-modes algorithm for clustering categorical data. We modify a simple matching dissimilarity measure for categorical objects, which allows the use of the fuzzy k-modes paradigm to obtain a cluster with strong intra-similarity, and to efficiently cluster large categorical data sets. We derive rigorously the updating formula of the fuzzy k-modes clustering algorithm with the new dissimilarity measure, and the convergence of the algorithm under the optimisation framework. Experimental results are presented to illustrate that the effectiveness of the new fuzzy k modes algorithm is better than those of the other existing k-modes algorithms.
IEEE Transactions on Neural Networks | 2013
Liping Jing; Michael K. Ng; Tieyong Zeng
In this paper, we study dictionary learning (DL) approach to identify the representation of low-dimensional subspaces from high-dimensional and nonnegative data. Such representation can be used to provide an affinity matrix among different subspaces for data clustering. The main contribution of this paper is to consider both nonnegativity and sparsity constraints together in DL such that data can be represented effectively by nonnegative and sparse coding coefficients and nonnegative dictionary bases. In the algorithm, we employ the proximal point technique for the resulting DL and sparsity optimization problem. We make use of coding coefficients to perform spectral clustering (SC) for data partitioning. Extensive experiments on real-world high-dimensional and nonnegative data sets, including text, microarray, and image data demonstrate that the proposed method can discover their subspace structures. Experimental results also show that our algorithm is computationally efficient and effective for obtaining high SC performance and interpreting the clustering results compared with the other testing methods.
IEEE Transactions on Image Processing | 2015
Liu Yang; Liping Jing; Michael K. Ng
Heterogeneous transfer learning has recently gained much attention as a new machine learning paradigm in which the knowledge can be transferred from source domains to target domains in different feature spaces. Existing works usually assume that source domains can provide accurate and useful knowledge to be transferred to target domains for learning. In practice, there may be noise appearing in given source (text) and target (image) domains data, and thus, the performance of transfer learning can be seriously degraded. In this paper, we propose a robust and non-negative collective matrix factorization model to handle noise in text-to-image transfer learning, and make a reliable bridge to transfer accurate and useful knowledge from the text domain to the image domain. The proposed matrix factorization model can be solved by an efficient iterative method, and the convergence of the iterative method can be shown. Extensive experiments on real data sets suggest that the proposed model is able to effectively perform transfer learning in noisy text and image domains, and it is superior to the popular existing methods for text-to-image transfer learning.
IEEE Transactions on Neural Networks | 2016
Liu Yang; Liping Jing; Jian Yu; Michael K. Ng
One of the main research problems in heterogeneous transfer learning is to determine whether a given source domain is effective in transferring knowledge to a target domain, and then to determine how much of the knowledge should be transferred from a source domain to a target domain. The main objective of this paper is to solve this problem by evaluating the relatedness among given domains through transferred weights. We propose a novel method to learn such transferred weights with the aid of co-occurrence data, which contain the same set of instances but in different feature spaces. Because instances with the same category should have similar features, our method is to compute their principal components in each feature space such that co-occurrence data can be rerepresented by these principal components. The principal component coefficients from different feature spaces for the same instance in the co-occurrence data have the same order of significance for describing the category information. By using these principal component coefficients, the Markov Chain Monte Carlo method is employed to construct a directed cyclic network where each node is a domain and each edge weight is the conditional dependence from one domain to another domain. Here, the edge weight of the network can be employed as the transferred weight from a source domain to a target domain. The weight values can be taken as a prior for setting parameters in the existing heterogeneous transfer learning methods to control the amount of knowledge transferred from a source domain to a target domain. The experimental results on synthetic and real-world data sets are reported to illustrate the effectiveness of the proposed method that can capture strong or weak relations among feature spaces, and enhance the learning performance of heterogeneous transfer learning.
knowledge discovery and data mining | 2011
Liping Jing; Jiali Yun; Jian Yu; Joshua Zhexue Huang
The language modeling approach is widely used to improve the performance of text mining in recent years because of its solid theoretical foundation and empirical effectiveness. In essence, this approach centers on the issue of estimating an accurate model by choosing appropriate language models as well as smooth techniques. Semantic smoothing, which incorporates semantic and contextual information into the language models, is effective and potentially significant to improve the performance of text mining. In this paper, we proposed a high-order structure to represent text data by incorporating background knowledge, Wikipedia. The proposed structure consists of three types of objects, term, document and concept. Moreover, we firstly combined the high-order co-clustering algorithm with the proposed model to simultaneously cluster documents, terms and concepts. Experimental results on benchmark data sets (20Newsgroups and Reuters-21578) have shown that our proposed high-order co-clustering on high-order structure outperforms the general co-clustering algorithm on bipartite text data, such as document-term, document-concept and document-(term+concept).
international conference of the ieee engineering in medicine and biology society | 2010
Liping Jing; Michael K. Ng; Ying Liu
Gene regulatory networks have been long studied in model organisms as a means of identifying functional relationships among genes or their corresponding products. Despite many existing methods for genome-wide construction of such networks, solutions to the gene regulatory networks problem are however not trivial. Here, we present, a hybrid approach with gene expression profiles and gene ontology (HAEO). HAEO makes use of multimethods (overlapping clustering and reverse engineering methods) to effectively and efficiently construct gene regulatory networks from multisources (gene expression profiles and gene ontology). Application to yeast cell cycle dataset demonstrates HAEOs ability to construct validated gene regulatory networks, such as some potential gene regulatory pairs, which cannot be discovered by general inferring methods and identifying cycles (i.e., feedback loops) between genes. We also experimentally study the efficiency of building networks and show that the proposed method, HAEO is much faster than Bayesian networks method.