Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Huifang Ma is active.

Publication


Featured researches published by Huifang Ma.


international conference on cloud computing | 2009

Parallel K-Means Clustering Based on MapReduce

Weizhong Zhao; Huifang Ma; Qing He

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel k -means clustering algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.


Knowledge and Information Systems | 2012

Effective semi-supervised document clustering via active learning with instance-level constraints

Weizhong Zhao; Qing He; Huifang Ma; Zhongzhi Shi

Semi-supervised document clustering, which takes into account limited supervised data to group unlabeled documents into clusters, has received significant interest recently. Because of getting supervised data may be expensive, it is important to get most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm and a new method for actively selecting informative instance-level constraints to get improved clustering performance. The semi- supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN) algorithm, which incorporates instance-level constraints to guide the clustering process in DBSCAN. An active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that Cons-DBSCAN with our proposed active learning approach can improve the clustering performance significantly when given a relatively small amount of constraints.


knowledge discovery and data mining | 2010

Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering

Huifang Ma; Weizhong Zhao; Qing Tan; Zhongzhi Shi

Semi-supervised clustering is often viewed as using labeled data to aid the clustering process However, existing algorithms fail to consider dual constraints between data points (e.g documents) and features (e.g words) To address this problem, in this paper, we propose a novel semi-supervised document co-clustering model OSS-NMF via orthogonal nonnegative matrix tri-factorization Our model incorporates prior knowledge both on document and word side to aid the new word-category and document-cluster matrices construction Besides, we prove the correctness and convergence of our model to demonstrate its mathematical rigorous Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with certain constraints.


Knowledge and Information Systems | 2013

A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints

Huifang Ma; Weizhong Zhao; Zhongzhi Shi

In this paper, we propose a new semi-supervised co-clustering algorithm Orthogonal Semi-Supervised Nonnegative Matrix Factorization (OSS-NMF) for document clustering. In this new approach, the clustering process is carried out by incorporating both prior domain knowledge of data points (documents) in the form of pair-wise constraints and category knowledge of features (words) into the NMF co-clustering framework. Under this framework, the clustering problem is formulated as the problem of finding the local minimizer of objective function, taking into account the dual prior knowledge. The update rules are derived, and an iterative algorithm is designed for the co-clustering process. Theoretically, we prove the correctness and convergence of our algorithm and demonstrate its mathematical rigorous. Our experimental evaluations show that the proposed document clustering model presents remarkable performance improvements with those constraints.


computer and information technology | 2009

A Probabilistic Model for Automatic Image Annotation and Retrieval

Zhixin Li; Huifang Ma; Zhiping Shi; Zhongzhi Shi

Automatic image annotation has become an important and challenging problem due to the existence of the semantic gap. In this paper, we present an approach based on probabilistic latent semantic analysis (PLSA) to achieve the task. In order to model training data precisely, an image is firstly represented as a bag of visual words, then a probabilistic structure with two PLSA models is employed to capture semantic information from visual and textual modalities respectively. Furthermore, an adaptive asymmetric learning approach is proposed to fuse the aspects of these two models. For each image document, the distribution over aspects of different models is fused by multiplying different weights, which are determined by the entropy of the feature distribution. Consequently, the two models are linked with the same distribution over all aspects. This structure can predict semantic annotation well for an unseen image because it associates visual and textual modalities properly. We compare our approach with several previous approaches on a standard Corel dataset. The experimental results show that our approach performs more effectively and accurately.


international conference on machine learning and cybernetics | 2008

Geodesic distance based aproach for sentence similarity computation

Huifang Ma; Qing He; Zhongzhi Shi

This paper presents a novel approach based on geodesic distance for sentence similarity computation, which can be used in a query-based information retrieval system. Unlike the traditional distance methods, geodesic distance takes into account the spatial relationships of sentences, which better reflects the intrinsic geometric structure of sentence manifold. Experiments demonstrate that the proposed method shows a better correlation to human intuition compared with traditional Euclidean method.


asia-pacific web conference | 2010

Combining the Missing Link: An Incremental Topic Model of Document Content and Hyperlink

Huifang Ma; Zhixin Li; Zhongzhi Shi

The content and structure of linked information such as sets of web pages or research paper archives are dynamic and keep on changing. Even though different methods are proposed to exploit both the link structure and the content information, no existing approach can effectively deal with this evolution. We propose a novel joint model, called Link-IPLSI, to combine texts and links in a topic modeling framework incrementally. The model takes advantage of a novel link updating technique that can cope with dynamic changes of online document streams in a faster and scalable way. Furthermore, an adaptive asymmetric learning method is adopted to freely control the assignment of weights to terms and citations. Experimental results on two different sources of online information demonstrate the time saving strength of our method and indicate that our model leads to systematic improvements in the quality of classification and link prediction.


web intelligence | 2009

Active Learning of Instance-Level Constraints for Semi-supervised Document Clustering

Weizhong Zhao; Qing He; Huifang Ma; Zhongzhi Shi

This paper presents a framework that actively selects informative documents pairs for semi-supervised document clustering. The semi-supervised document clustering algorithm is a Constrained DBSCAN (Cons-DBSCAN), which incorporates instance-level constraints to guide the clustering process in DBSCAN. By obtaining user feedbacks, our proposed active learning algorithm can get informative instance level constraints to aid clustering process. Experimental results show that Cons-DBSCAN with the proposed active learning approach can provide an appealing clustering performance.


asia information retrieval symposium | 2009

IPHITS: An Incremental Latent Topic Model for Link Structure

Huifang Ma; Weizhong Zhao; Zhixin Li; Zhongzhi Shi

The structure of linked documents is dynamic and keeps on changing. Even though different methods have been proposed to exploit the link structure in identifying hubs and authorities in a set of linked documents, no existing approach can effectively deal with its changing situation. This paper explores changes in linked documents and proposes an incremental link probabilistic framework, which we call IPHITS. The model deals with online document streams in a faster, scalable way and uses a novel link updating technique that can cope with dynamic changes. Experimental results on two different sources of online information demonstrate the time saving strength of our method. Besides, we make analysis of the stable rankings under small perturbations to the linkage patterns.


fuzzy systems and knowledge discovery | 2011

The improved non-negative Matrix Factorization algorithm for document clustering

Weizhong Zhao; Huifang Ma; Qing He; Zhongzhi Shi

Non-negative Matrix Factorization (NMF) is one latest presented approach for obtaining document clusters, which aimed to provide a minimum error non-negative representation of the term-document matrix. In this paper, we have extended the classical NMF approach by imposing sparseness constraints explicitly. The new model can learn much sparser matrix factorization. Also, an objective function is defined to impose the sparseness constraint, in addition to the non-negative constraint. Experimental results on real-world document datasets show that the proposed method can treat document clustering effectively and efficiently.

Collaboration


Dive into the Huifang Ma's collaboration.

Top Co-Authors

Avatar

Zhongzhi Shi

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Qing He

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Weizhong Zhao

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhixin Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Qing Tan

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge