Fuzhen Zhuang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fuzhen Zhuang is active.

Explore More

Publication

Featured researches published by Fuzhen Zhuang.

Neurocomputing | 2013

Parallel extreme learning machine for regression based on MapReduce

Qing He; Tianfeng Shang; Fuzhen Zhuang; Zhongzhi Shi

Regression is one of the most basic problems in data mining. For regression problem, extreme learning machine (ELM) can get better generalization performance at a much faster learning speed. However, the enlarging volume of datasets makes regression by ELM on very large scale datasets a challenging task. Through analyzing the mechanism of ELM algorithm, an efficient parallel ELM for regression is designed and implemented based on MapReduce framework, which is a simple but powerful parallel programming technique currently. The experimental results demonstrate that the proposed parallel ELM for regression can efficiently handle very large datasets on commodity hardware with a good performance on different evaluation criterions, including speedup, scaleup and sizeup.

conference on information and knowledge management | 2008

Transfer learning from multiple source domains via consensus regularization

Ping Luo; Fuzhen Zhuang; Hui Xiong; Yuhong Xiong; Qing He

Recent years have witnessed an increased interest in transfer learning. Despite the vast amount of research performed in this field, there are remaining challenges in applying the knowledge learnt from multiple source domains to a target domain. First, data from multiple source domains can be semantically related, but have different distributions. It is not clear how to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain. Second, many real-world applications demand this transfer learning to be performed in a distributed manner. To meet these challenges, we propose a consensus regularization framework for transfer learning from multiple source domains to a target domain. In this framework, a local classifier is trained by considering both local data available in a source domain and the prediction consensus with the classifiers from other source domains. In addition, the training algorithm can be implemented in a distributed manner, in which all the source-domains are treated as slave nodes and the target domain is used as the master node. To combine the training results from multiple source domains, it only needs share some statistical data rather than the full contents of their labeled data. This can modestly relieve the privacy concerns and avoid the need to upload all data to a central location. Finally, our experimental results show the effectiveness of our consensus regularization learning.

Statistical Analysis and Data Mining | 2011

Exploiting associations between word clusters and document classes for cross-domain text categorization

Fuzhen Zhuang; Ping Luo; Hui Xiong; Qing He; Yuhong Xiong; Zhongzhi Shi

Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source domain to an unlabeled target domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw-word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the non-negative matrix tri-factorization. Specifically, we formulate a joint optimization framework of the two matrix tri-factorizations for the source- and target-domain data, respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well.

Neurocomputing | 2015

Learning deep representations via extreme learning machines

Wenchao Yu; Fuzhen Zhuang; Qing He; Zhongzhi Shi

Extreme learning machine (ELM) as an emerging technology has achieved exceptional performance in large-scale settings, and is well suited to binary and multi-class classification, as well as regression tasks. However, existing ELM and its variants predominantly employ single hidden layer feedforward networks, leaving the popular and potentially powerful stacked generalization principle unexploited for seeking predictive deep representations of input data. Deep architectures can find higher-level representations, thus can potentially capture relevant higher-level abstractions. But most of current deep learning methods require solving a difficult and non-convex optimization problem. In this paper, we propose a stacked model, DrELM, to learn deep representations via extreme learning machine according to stacked generalization philosophy. The proposed model utilizes ELM as a base building block and incorporates random shift and kernelization as stacking elements. Specifically, in each layer, DrELM integrates a random projection of the predictions obtained by ELM into the original feature, and then applies kernel functions to generate the resultant feature. To verify the classification and regression performance of DrELM, we conduct the experiments on both synthetic and real-world data sets. The experimental results show that DrELM outperforms ELM and kernel ELMs, which appear to demonstrate that DrELM could yield predictive features that are suitable for prediction tasks. The performances of the deep models (i.e. Stacked Auto-encoder) are comparable. However, due to the utilization of ELM, DrELM is easier to learn and faster in testing.

rough sets and knowledge technology | 2010

Parallel implementation of classification algorithms based on MapReduce

Qing He; Fuzhen Zhuang; Jincheng Li; Zhongzhi Shi

Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as k-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.

Applied Mathematics and Computation | 2013

Particle swarm optimization using dimension selection methods

Xin Jin; Yongquan Liang; Dongping Tian; Fuzhen Zhuang

Particle swarm optimization (PSO) has undergone many changes since its introduction in 1995. Being a stochastic algorithm, PSO and its randomness present formidable challenge for the theoretical analysis of it, and few of the existing PSO improvements have make an effort to eliminate the random coefficients in the PSO updating formula. This paper analyzes the importance of the randomness in the PSO, and then gives a PSO variant without randomness to show that traditional PSO cannot work without randomness. Based on our analysis of the randomness, another way of using randomness is proposed in PSO with random dimension selection (PSORDS) algorithm, which utilizes random dimension selection instead of stochastic coefficients. Finally, deterministic methods to do the dimension selection are proposed, and the resultant PSO with distance based dimension selection (PSODDS) algorithm is greatly superior to the traditional PSO and PSO with heuristic dimension selection (PSOHDS) algorithm is comparable to traditional PSO algorithm. In addition, using our dimension selection method to a newly proposed modified particle swarm optimization (MPSO) algorithm also gets improved results. The experiment results demonstrate that our analysis about the randomness is correct and the usage of deterministic dimension selection method is very helpful.

Neurocomputing | 2011

A parallel incremental extreme SVM classifier

Qing He; Changying Du; Qun Wang; Fuzhen Zhuang; Zhongzhi Shi

Abstract The classification algorithm extreme SVM (ESVM) proposed recently has been proved to provide very good generalization performance in relatively short time, however, it is inappropriate to deal with large-scale data set due to the highly intensive computation. Thus we propose to implement an efficient parallel ESVM (PESVM) based on the current and powerful parallel programming framework MapReduce. Furthermore, we investigate that for some new coming training data, it is brutal for ESVM to always retrain a new model on all training data (including old and new coming data). Along this line, we develop an incremental learning algorithm for ESVM (IESVM), which can meet the requirement of online learning to update the existing model. Following that we also provide the parallel version of IESVM (PIESVM), which can solve both the large-scale problem and the online problem at the same time. The experimental results show that the proposed parallel algorithms not only can tackle large-scale data set, but also scale well in terms of the evaluation metrics of speedup, sizeup and scaleup. It is also worth to mention that PESVM, IESVM and PIESVM are much more efficient than ESVM, while the same solutions as ESVM are exactly obtained.

european conference on machine learning | 2013

Shared structure learning for multiple tasks with multiple views

Xin Jin; Fuzhen Zhuang; Shuhui Wang; Qing He; Zhongzhi Shi

Real-world problems usually exhibit dual-heterogeneity, i.e., every task in the problem has features from multiple views, and multiple tasks are related with each other through one or more shared views. To solve these multi-task problems with multiple views, we propose a shared structure learning framework, which can learn shared predictive structures on common views from multiple related tasks, and use the consistency among different views to improve the performance. An alternating optimization algorithm is derived to solve the proposed framework. Moreover, the computation load can be dealt with locally in each task during the optimization, through only sharing some statistics, which significantly reduces the time complexity and space complexity. Experimental studies on four real-world data sets demonstrate that our framework significantly outperforms the state-of-the-art baselines.

IEEE Transactions on Knowledge and Data Engineering | 2010

Cross-Domain Learning from Multiple Sources: A Consensus Regularization Perspective

Fuzhen Zhuang; Ping Luo; Hui Xiong; Yuhong Xiong; Qing He; Zhongzhi Shi

Classification across different domains studies how to adapt a learning model from one domain to another domain which shares similar data characteristics. While there are a number of existing works along this line, many of them are only focused on learning from a single source domain to a target domain. In particular, a remaining challenge is how to apply the knowledge learned from multiple source domains to a target domain. Indeed, data from multiple source domains can be semantically related, but have different data distributions. It is not clear how to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain. To that end, in this paper, we propose a consensus regularization framework for learning from multiple source domains to a target domain. In this framework, a local classifier is trained by considering both local data available in one source domain and the prediction consensus with the classifiers learned from other source domains. Moreover, we provide a theoretical analysis as well as an empirical study of the proposed consensus regularization framework. The experimental results on text categorization and image classification problems show the effectiveness of this consensus regularization learning method. Finally, to deal with the situation that the multiple source domains are geographically distributed, we also develop the distributed version of the proposed algorithm, which avoids the need to upload all the data to a centralized location and helps to mitigate privacy concerns.

IEEE Transactions on Knowledge and Data Engineering | 2012

Erratum to "Mining Distinction and Commonality across Multiple Domains Using Generative Model for Text Classification"

Fuzhen Zhuang; Ping Luo; Zhiyong Shen; Qing He; Yuhong Xiong; Zhongzhi Shi; Hui Xiong

The distribution difference among multiple domains has been exploited for cross-domain text categorization in recent years. Along this line, we show two new observations in this study. First, the data distribution difference is often due to the fact that different domains use different index words to express the same concept. Second, the association between the conceptual feature and the document class can be stable across domains. These two observations actually indicate the distinction and commonality across domains. Inspired by the above observations, we propose a generative statistical model, named Collaborative Dual-PLSA (CD-PLSA), to simultaneously capture both the domain distinction and commonality among multiple domains. Different from Probabilistic Latent Semantic Analysis (PLSA) with only one latent variable, the proposed model has two latent factors y and z, corresponding to word concept and document class, respectively. The shared commonality intertwines with the distinctions over multiple domains, and is also used as the bridge for knowledge transformation. An Expectation Maximization (EM) algorithm is developed to solve the CD-PLSA model, and further its distributed version is exploited to avoid uploading all the raw data to a centralized location and help to mitigate privacy concerns. After the training phase with all the data from multiple domains we propose to refine the immediate outputs using only the corresponding local data. In summary, we propose a two-phase method for cross-domain text classification, the first phase for collaborative training with all the data, and the second step for local refinement. Finally, we conduct extensive experiments over hundreds of classification tasks with multiple source domains and multiple target domains to validate the superiority of the proposed method over existing state-of-the-art methods of supervised and transfer learning. It is noted to mention that as shown by the experimental results CD-PLSA for the collaborative training is more tolerant of distribution differences, and the local refinement also gains significant improvement in terms of classification accuracy.

Explore More