Yanbo J. Wang
University of Liverpool
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yanbo J. Wang.
international conference on data mining | 2007
Yanbo J. Wang; Qin Xin; Frans Coenen
Classification association rule mining (CARM) is a recent classification rule mining approach that builds an association rule mining based classifier using classification association rules (CARs). Regardless of which particular CARM algorithm is used, a similar set of CARs is always generated from data, and a classifier is usually presented as an ordered CAR list, based on a selected rule ordering strategy. In the past decade, a number of rule ordering strategies have been introduced that can be categorized under three headings: (1) support-confidence, (2) rule weighting, and (3) hybrid. In this paper, we propose an alternative rule-weighting scheme, namely CISRW (class-item score based rule weighting), and develop a rule-weighting based rule ordering mechanism based on CISRW. Subsequently, two hybrid strategies are further introduced by combining (1) and CISRW. The experimental results show that the three proposed CISRW based/related rule ordering strategies perform well with respect to the accuracy of classification.
international conference on data mining | 2009
Omar Baqueiro; Yanbo J. Wang; Peter McBurney; Frans Coenen
In this paper, we introduce an integration study which combines Data Mining (DM) and Agent Based Modeling and Simulation (ABMS). This study, as a new paradigm for DM/ABMS, is concerned with two approaches: (i) applying DM techniques in ABMS investigation, and inversely (ii) utilizing ABMS results in DM research. Detailed description of each approach is presented in this paper. A conclusion and the future work of this (integration) study are given at the end.
machine learning and data mining in pattern recognition | 2007
Frans Coenen; Paul H. Leng; Robert Sanderson; Yanbo J. Wang
Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach.
international conference on data mining | 2009
Weiqi Wang; Yanbo J. Wang; René Bañares-Alcántara; Zhanfeng Cui; Frans Coenen
In this paper, data mining is used to analyze the differentiation of mammalian Mesenchymal Stem Cells (MSCs). A database comprising the key parameters which, we believe, influence the destiny of mammalian MSCs has been constructed. This paper introduces Classification Association Rule Mining (CARM) as a data mining technique in the domain of tissue engineering and initiates a new promising research field. The experimental results show that the proposed approach performs well with respect to the accuracy of (classification) prediction. Moreover, it was found that some rules mined from the constructed MSC database are meaningful and useful.
Data Mining: Foundations and Practice | 2008
Yanbo J. Wang; Qin Xin; Frans Coenen
Classification Rule Mining (CRM) is a well-known Data Mining technique for the extraction of hidden Classification Rules (CRs) from a given database that is coupled with a set of pre-defined classes, the objective being to build a classifier to classify “unseen” data-records. One recent approach to CRM is to employ Association Rule Mining (ARM) techniques to identify the desired CRs, i.e. Classification Association Rule Mining (CARM). Although the advantages of accuracy and efficiency offered by CARM have been established in many papers, one major drawback is the large number of Classification Association Rules (CARs) that may be generated — up to a maximum of “2 n − n − 1” in the worst case, where n represents the number of data-attributes in a database. However, there are only a limited number, say at most k in each class, of CARs that are required to distinguish between classes. The problem addressed in this chapter is how to efficiently identify the k such CARs. Having a CAR list that is generated from a given database, based on the well-established “Support-Confidence” framework, a rule weighting scheme is proposed in this chapter, which assigns a score to a CAR that evaluates how significantly this CAR contributes to a single pre-defined class. Consequently a rule mining approach is presented, that addresses the above, that operates in time O(k 2 n 2) in its deterministic fashion, and O(kn) in its randomised fashion, where k represents the number of CARs in each class that are potentially significant to distinguish between classes and k ≥ k; as opposed to exponential time O(2 n ) — the time required in score computation to mine all k CARs in a “one-by-one” manner. The experimental results show good performance regarding the accuracy of classification when using the proposed rule weighting scheme with a suggested rule ordering mechanism, and evidence that the proposed rule mining approach performs well with respect to the efficiency of computation.
International Conference on Innovative Techniques and Applications of Artificial Intelligence | 2007
Yanbo J. Wang; Frans Coenen; Paul H. Leng; Robert Sanderson
A number of language-independent text pre-processing techniques, to support multi-class single-label text classification, are described and compared. A simple but effective statistical keyword identification approach is proposed, coupled with a number of phrase identification mechanisms. Experimental results are presented.
international conference on data mining | 2010
Yanbo J. Wang; Fan Li; Frans Coenen; Robert Sanderson; Qin Xin
Textual Feature Selection (TFS) is an important phase in the process of text classification. It aims to identify the most significant textual features (i.e. key words and/or phrases), in a textual dataset, that serve to distinguish between text categories. In TFS, basic techniques can be divided into two groups: linguistic vs. statistical. For the purpose of building a language-independent text classifier, the study reported here is concerned with statistical TFS only. In this paper, we propose a novel statistical TFS approach that hybridizes the ideas of two existing techniques, DIAAF (Darmstadt Indexing Approach Association Factor) and RS (Relevancy Score). With respect to associative (text) classification, the experimental results demonstrate that the proposed approach can produce greater classification accuracy than other alternative approaches.
international conference on computer application and system modeling | 2010
Weiqi Wang; René Bañares-Alcántara; Zhanfeng Cui; Yanbo J. Wang; Frans Coenen
In this paper, a software toolkit has been developed for in silica prediction of the differentiation destiny of Mesenchymal Stem Cells (MSCs) in vitro. The software toolkit was developed in CLIPS (C Language Integrated Production System) as an expert system, with a java-based GUI. This toolkit utilizes the rules obtained from previous experimental data via data mining techniques, based on which the prediction is to be made. Thus, the prediction accuracy can be affected by the amount and quality of the rules, which can be improved by both manual adjustment and expansion of the MSC differentiation database in future.
machine learning and data mining in pattern recognition | 2007
Yanbo J. Wang; Qin Xin; Frans Coenen
machine learning and data mining in pattern recognition | 2008
Yanbo J. Wang; Qin Xin; Frans Coenen