Is this you? Create Your Porfile

Zhanhuai Li

Northwestern Polytechnical University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Zhanhuai Li is active.

Explore More

Publication

Featured researches published by Zhanhuai Li.

advanced data mining and applications | 2008

Sequential Pattern Mining for Protein Function Prediction

Miao Wang; Xuequn Shang; Zhanhuai Li

The prediction of protein sequence function is one of the problems arising in the recent progress in bioinformatics. Traditional methods have its limits. We present a novel method of protein sequence function prediction based on sequential pattern mining. First, we use our designed sequential pattern mining algorithms to mine known function sequence dataset. Then, we build a classifier using the patterns generated to predict function of protein sequences. Experiments confirm the effectiveness of our method.

pacific-asia conference on knowledge discovery and data mining | 2004

DRC-BK: Mining Classification Rules with Help of SVM

Yang Zhang; Zhanhuai Li; Yan Tang; Kebin Cui

Currently, the accuracy of SVM classifier is very high, but the classification model of SVM classifier is not understandable by human experts. In this paper, we use SVM, which is applied with a Boolean kernel, to construct a hyper-plan for classification, and mine classification rules from this hyper-plane. In this way, we build DRC-BK, a decision rule classifier. Experiment results show that DRC-BK has a higher accuracy than some state-of-art decision rule (decision tree) classifiers, such as C4.5, CBA, CMAR, CAEP and so on.

international conference on computational science and its applications | 2005

DRC-BK : mining classification rules by using Boolean kernels

Yang Zhang; Zhanhuai Li; Kebin Cui

An understandable classification models is very useful to human experts. Currently, SVM classifiers have good classification performance; however, their classification model is non-understandable. In this paper, we build DRC-BK, a decision rule classifier, which is based on structural risk minimization theory. Experiment results on UCI dataset and Reuters21578 dataset show that DRC-BK has excellent classification performance and excellent scalability, and that when applied with MPDNF kernel, DRC-BK performances the best.

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms | 2007

An Improved SVM Classifier for Medical Image Classification

Yun Jiang; Zhanhuai Li; Longbo Zhang; Peng Sun

Support Vector Machine (SVM) has high classifying accuracy and good capabilities of fault-tolerance and generalization. The Rough Set Theory (RST) approach has the advantages on dealing with a large amount of data and eliminating redundant information. In this paper, we join SVM classifier with RST which we call the Improved Support Vector Machine (ISVM) to classify digital mammography. The experimental results show that this ISVM classifier can get 96.56% accuracy which is higher about 3.42% than 92.94% using SVM, and the error recognition rates are close to 100% averagely.

fuzzy systems and knowledge discovery | 2006

Classifying noisy data streams

Yong Wang; Zhanhuai Li; Yang Zhang

The two main challenges associated with mining data streams are concept drifting and data noise. Current algorithms mainly depend on the robust of the base classifier or learning ensembles, and have no active mechanisms to deal noisy. However, noise still can induce the drastic drops in accuracy. In this paper, we present a clustering-based method to filter out hard instances and noise instances from data streams. We also propose a trigger to detect concept drifting and build RobustBoosting, an ensemble classifier, by boosting the hard instances. We evaluated RobustBoosting algorithm and AdaptiveBoosting algorithm [1] on the synthetic and real-life data sets. The experiment results show that the proposed method has substantial advantage over AdaptiveBoosting algorithm in prediction accuracy, and that it can converge to target concepts efficiently with high accuracy on datasets with noise level as high as 40%.

advanced data mining and applications | 2006

Improving the performance of data stream classifiers by mining recurring contexts

Yong Wang; Zhanhuai Li; Yang Zhang; Longbo Zhang; Yun Jiang

Traditional researches on data stream mining only put emphasis on building classifiers with high accuracy, which always results in classifiers with dramatic drop of accuracy when concept drifts. In this paper, we present our RTRC system that has good classification accuracy when concept drifts and enough samples are scanned in data stream. By using Markov chain and least-square method, the system is able to predict not only on which the next concept is but also on when the concept is to drift. Experimental results confirm the advantages of our system over Weighted Bagging and CVFDT, two representative systems in streaming data mining.

international conference on intelligent computing | 2007

New Sampling-Based Summary Structures for Sliding Windows over Data Streams

Longbo Zhang; Zhanhuai Li; Min Yu; Guangyuan Zhao

The main focus in algorithms has been on efficient construction of summary structures for data streams. This paper introduces the problem of construction of summary structures from sliding windows over data streams, and presents a new sampling-based summary structure and new techniques for its fast incremental maintenance. When a new data item v i arrives, a key k i is calculated and a random number X i is generated. The key k i is used to determine if v i will be selected to enter the sample, and X i is used to determine how many data items will be skipped over. The experiments show that the new algorithm is effective and efficient for construction of summary structures from sliding windows over data streams.

International Journal of Knowledge Discovery in Bioinformatics | 2010

Efficient Mining Frequent Closed Discriminative Biclusters by Sample-Growth: The FDCluster Approach

Miao Wang; Xuequn Shang; Shaohua Zhang; Zhanhuai Li

DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.

international conference on intelligent computing | 2009

The Design of Finite State Machine for Asynchronous Replication Protocol

Yanlong Wang; Zhanhuai Li; Wei Lin; Minglei Hei; Jianhua Hao

Data replication is a key way to design a disaster tolerance system and to achieve reliability and availability. It is difficult for a replication protocol to deal with the diverse and complex environment. This means that data is less well replicated than it ought to be. To reduce data loss and to optimize replication protocols, we (1) present a finite state machine, (2) run it to manage an asynchronous replication protocol and (3) report a simple evaluation of the asynchronous replication protocol based on our state machine. Its proved that our state machine is applicable to guarantee the asynchronous replication protocol running in the proper state to the largest extent in the event of various possible events. It also can helpful to build up replication-based disaster tolerance systems to ensure the business continuity.

web information systems engineering | 2006

Supporting complex query with structured overlays in schema-based p2p system

Min Yu; Zhanhuai Li; Longbo Zhang

Despite of their advantages in scalability and routing efficiency, structured peer-to-peer(P2P) overlay networks fail to support complex queries in a network of peers with heterogeneous schemas, which limits their use in schema-based P2P systems. By using relation keywords as index key for schema info and partitioning tuples vertically, a method of indexing both schema and data with structured overlay is designed. And an algorithm based on these two levels of indices to support complex queries on multiple attributes is proposed. Qualitative analysis and comparison show that this work is closer to the goal of P2P data management than other projects.

Explore More