Minghao Piao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minghao Piao is active.

Explore More

Publication

Featured researches published by Minghao Piao.

Bioinformatics | 2012

An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data

Yongjun Piao; Minghao Piao; Kiejung Park; Keun Ho Ryu

MOTIVATION Gene selection for cancer classification is one of the most important topics in the biomedical field. However, microarray data pose a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target. RESULTS In this article, we proposed an Ensemble Correlation-Based Gene Selection algorithm based on symmetrical uncertainty and Support Vector Machine. In our method, symmetrical uncertainty was used to analyze the relevance of the genes, the different starting points of the relevant subset were used to generate the gene subsets and the Support Vector Machine was used as an evaluation criterion of the wrapper. The efficiency and effectiveness of our method were demonstrated through comparisons with other feature selection techniques, and the results show that our method outperformed other methods published in the literature.

Mathematical Problems in Engineering | 2015

A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Yongjun Piao; Minghao Piao; Cheng Hao Jin; Ho Sun Shon; Ji-Moon Chung; Buhyun Hwang; Keun Ho Ryu

Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.

Journal of Information Processing Systems | 2010

IMTAR: Incremental Mining of General Temporal Association Rules

Anour F. A. Dafa-Alla; Ho Sun Shon; Khalid E. K. Saeed; Minghao Piao; Unil Yun; Kyung Joo Cheoi; Keun Ho Ryu

Nowadays due to the rapid advances in the field of information systems, transactional databases are being updated regularly and/or periodically. The knowledge discovered from these databases has to be maintained, and an incremental updating technique needs to be developed for maintaining the discovered association rules from these databases. The concept of Temporal Association Rules has been introduced to solve the problem of handling time series by including time expressions into association rules. In this paper we introduce a novel algorithm for Incremental Mining of General Temporal Association Rules (IMTAR) using an extended TFP-tree. The main benefits introduced by our algorithm are that it offers significant advantages in terms of storage and running time and it can handle the problem of mining general temporal association rules in incremental databases by building TFP-trees incrementally. It can be utilized and applied to real life application domains. We demonstrate our algorithm and its advantages in this paper.

international conference on intelligent computing | 2008

Application of Classification Methods for Forecasting Mid-Term Power Load Patterns

Minghao Piao; Heon Gyu Lee; Jin Hyoung Park; Keun Ho Ryu

Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed approach in this paper consists of three stages: (i) data preprocessing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

advanced data mining and applications | 2009

Discovery of Significant Classification Rules from Incrementally Inducted Decision Tree Ensemble for Diagnosis of Disease

Minghao Piao; Jong Bum Lee; Khalid E. K. Saeed; Keun Ho Ryu

Previous studies show that using significant classification rules to accomplish the classification task is suitable for bio-medical research. Discovery of many significant rules could be performed by using ensemble methods in decision tree induction. However, those traditional approaches are not useful for incremental task. In this paper, we use an ensemble method named Cascading and Sharing to derive many significant classification rules from incrementally inducted decision tree and improve the classifiers accuracy.

Symmetry | 2016

A Data Mining Approach for Cardiovascular Disease Diagnosis Using Heart Rate Variability and Images of Carotid Arteries

Hyeongsoo Kim; Musa Ibrahim M. Ishag; Minghao Piao; Taeil Kwon; Keun Ho Ryu

In this paper, we proposed not only an extraction methodology of multiple feature vectors from ultrasound images for carotid arteries (CAs) and heart rate variability (HRV) of electrocardiogram signal, but also a suitable and reliable prediction model useful in the diagnosis of cardiovascular disease (CVD). For inventing the multiple feature vectors, we extract a candidate feature vector through image processing and measurement of the thickness of carotid intima-media (IMT). As a complementary way, the linear and/or nonlinear feature vectors are also extracted from HRV, a main index for cardiac disorder. The significance of the multiple feature vectors is tested with several machine learning methods, namely Neural Networks, Support Vector Machine (SVM), Classification based on Multiple Association Rule (CMAR), Decision tree induction and Bayesian classifier. As a result, multiple feature vectors extracted from both CAs and HRV (CA+HRV) showed higher accuracy than the separative feature vectors of CAs and HRV. Furthermore, the SVM and CMAR showed about 89.51% and 89.46%, respectively, in terms of diagnosing accuracy rate after evaluating the diagnosis or prediction methods using the finally chosen multiple feature vectors. Therefore, the multiple feature vectors devised in this paper can be effective diagnostic indicators of CVD. In addition, the feature vector analysis and prediction techniques are expected to be helpful tools in the decisions of cardiologists.

international conference on computer engineering and technology | 2010

A data mining approach for dyslipidemia disease prediction using carotid arterial feature vectors

Minghao Piao; Heon Gyu Lee; Couchol Pok; Keun Ho Ryu

In this paper, we proposed a useful methodology for the diagnosis of dyslipidemia disease by using novel various features of carotid arterial wall thickness. We measured and tested intima-media thickness of carotid arteries and used them as diagnostic feature vectors. In order to evaluate extracted various features, we tested on five classification methods and evaluated performance of classifiers. As a result, SVM and Neural Network algorithms (about 92%–98% goodness of fit) outperformed the other classifiers on those selected features.

Osong public health and research perspectives | 2014

A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs

Feifei Li; Minghao Piao; Yongjun Piao; Meijing Li; Keun Ho Ryu

Objectives Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. Methods We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearsons correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. Results The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Conclusion Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

Archive | 2012

Evolutional Diagnostic Rules Mining for Heart Disease Classification Using ECG Signal Data

Minghao Piao; Yongjun Piao; Ho Sun Shon; Jang-Whan Bae; Keun Ho Ryu

Medical information related data sets are useful for the diagnosis and treatment of disease. With the development of technology and devices in biomedical engineering, it leads data overflow nowadays. Traditional data mining methods like SVM, ANN and decision tree are applied to perform the classification of arrhythmia disease. However, traditional analysis methods are far beyond the capacity and speed to deal with large scale of information. Techniques that have capability to handle the coming data sets in incremental learning phase can solve those problems. Therefore, in this paper, we proposed an incremental decision trees induction method which uses ensemble method for mining evolutional diagnostic rules for cardiac arrhythmia classification. Experimental results show that our proposed method performs better than other algorithms in our study.

fuzzy systems and knowledge discovery | 2009

Emerging Patterns Based Methodology for Prediction of Patients with Myocardial Ischemia

Minghao Piao; Heon Gyu Lee; Gyo Yong Sohn; Gouchol Pok; Keun Ho Ryu

Heart disease is the one of the significant health problem in the world. Recently, most serious problem caused by it is that the patient becomes younger. Therefore, it is very important and necessary to find the early symptoms of heart problems for better treatment and effective methodology for predicting the disease. Data mining is the one of the efficient approaches. However, there are still some tasks have to be solved. One is that the result should make it easy to explain the relationship between class label and predictors for the heart disease data. In this paper, redefined T-tree algorithm is used to mine the emerging patterns to perform the work and solve the problem. Also, the aggregate score is considered to build classifier for the prediction work. The algorithms CMAR, CPAR, C4.5 and our method are applied to the dataset and the proposed method shows the better accuracy than others (The accuracy is between 75% to 85%)

Explore More