Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongjun Piao is active.

Publication


Featured researches published by Yongjun Piao.


BMC Bioinformatics | 2015

Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data

Peipei Li; Yongjun Piao; Ho Sun Shon; Keun Ho Ryu

BackgroundRecently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments.ResultsIn this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results.ConclusionSpearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.


Bioinformatics | 2012

An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data

Yongjun Piao; Minghao Piao; Kiejung Park; Keun Ho Ryu

MOTIVATION Gene selection for cancer classification is one of the most important topics in the biomedical field. However, microarray data pose a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target. RESULTS In this article, we proposed an Ensemble Correlation-Based Gene Selection algorithm based on symmetrical uncertainty and Support Vector Machine. In our method, symmetrical uncertainty was used to analyze the relevance of the genes, the different starting points of the relevant subset were used to generate the gene subsets and the Support Vector Machine was used as an evaluation criterion of the wrapper. The efficiency and effectiveness of our method were demonstrated through comparisons with other feature selection techniques, and the results show that our method outperformed other methods published in the literature.


Mathematical Problems in Engineering | 2015

A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Yongjun Piao; Minghao Piao; Cheng Hao Jin; Ho Sun Shon; Ji-Moon Chung; Buhyun Hwang; Keun Ho Ryu

Ensemble data mining methods, also known as classifier combination, are often used to improve the performance of classification. Various classifier combination methods such as bagging, boosting, and random forest have been devised and have received considerable attention in the past. However, data dimensionality increases rapidly day by day. Such a trend poses various challenges as these methods are not suitable to directly apply to high-dimensional datasets. In this paper, we propose an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partitioning of redundant features. In our method, the redundancy of features is considered to divide the original feature space. Then, each generated feature subset is trained by a support vector machine, and the results of each classifier are combined by majority voting. The efficiency and effectiveness of our method are demonstrated through comparisons with other ensemble techniques, and the results show that our method outperforms other methods.


international conference on big data and smart computing | 2014

Ensemble method for classification of high-dimensional data

Yongjun Piao; Hyun Woo Park; Cheng Hao Jin; Keun Ho Ryu

Ensemble methods, also known as classifier combination were often used to improve the performance of classification. Growing problem of data dimensionality makes a various challenges for supervised learning. Generally used classification methods such as decision tree, neural network and support vector machines were difficult to be directly applied on high-dimensional datasets. In this paper, we proposed an ensemble method for classification of high-dimensional data, with each classifier constructed from a different set of features determined by partition of redundant features. In our method, the redundancy of features was considered to divide the original feature space. Then, each generated feature subset was trained by support vector machine and the results of each classifier were combined by the majority voting method. The efficiency and effectiveness of our method were demonstrated through comparisons with other ensemble techniques, and the results showed that our method outperformed other methods.


Osong public health and research perspectives | 2014

A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs

Feifei Li; Minghao Piao; Yongjun Piao; Meijing Li; Keun Ho Ryu

Objectives Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. Methods We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearsons correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. Results The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Conclusion Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.


Archive | 2012

Evolutional Diagnostic Rules Mining for Heart Disease Classification Using ECG Signal Data

Minghao Piao; Yongjun Piao; Ho Sun Shon; Jang-Whan Bae; Keun Ho Ryu

Medical information related data sets are useful for the diagnosis and treatment of disease. With the development of technology and devices in biomedical engineering, it leads data overflow nowadays. Traditional data mining methods like SVM, ANN and decision tree are applied to perform the classification of arrhythmia disease. However, traditional analysis methods are far beyond the capacity and speed to deal with large scale of information. Techniques that have capability to handle the coming data sets in incremental learning phase can solve those problems. Therefore, in this paper, we proposed an incremental decision trees induction method which uses ensemble method for mining evolutional diagnostic rules for cardiac arrhythmia classification. Experimental results show that our proposed method performs better than other algorithms in our study.


Computers in Biology and Medicine | 2017

Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles

Yongjun Piao; Minghao Piao; Keun Ho Ryu

Cancer classification has been a crucial topic of research in cancer treatment. In the last decade, messenger RNA (mRNA) expression profiles have been widely used to classify different types of cancers. With the discovery of a new class of small non-coding RNAs; known as microRNAs (miRNAs), various studies have shown that the expression patterns of miRNA can also accurately classify human cancers. Therefore, there is a great demand for the development of machine learning approaches to accurately classify various types of cancers using miRNA expression data. In this article, we propose a feature subset-based ensemble method in which each model is learned from a different projection of the original feature space to classify multiple cancers. In our method, the feature relevance and redundancy are considered to generate multiple feature subsets, the base classifiers are learned from each independent miRNA subset, and the average posterior probability is used to combine the base classifiers. To test the performance of our method, we used bead-based and sequence-based miRNA expression datasets and conducted 10-fold and leave-one-out cross validations. The experimental results show that the proposed method yields good results and has higher prediction accuracy than popular ensemble methods. The Java program and source code of the proposed method and the datasets in the experiments are freely available at https://sourceforge.net/projects/mirna-ensemble/.


asian conference on intelligent information and database systems | 2017

A Hybrid Feature Selection Method Based on Symmetrical Uncertainty and Support Vector Machine for High-Dimensional Data Classification

Yongjun Piao; Keun Ho Ryu

MicroRNA (miRNA) is a small, endogenous, and non-coding RNA that plays a critical regulatory role in various biological processes. Recently, researches based on microRNA expression profiles showed a new aspect of multiclass cancer classification. Due to the high dimensionality, however, classification of miRNA expression data contains several computational challenges. In this paper, we proposed a hybrid feature selection method for accurately classification of various cancer types based on miRNA expression data. Symmetrical uncertainty was employed as a filter part and support vector machine with best first search were used as a wrapper part. To validate the efficiency of the proposed method, we conducted several experiments on a real bead-based miRNA expression datasets and the results showed that our method can significantly improve the classification accuracy and outperformed the existing feature selection methods.


international conference on information technology | 2017

A Hybrid Feature Selection Method to Classification and Its Application in Hypertension Diagnosis

Hyun Woo Park; Dingkun Li; Yongjun Piao; Keun Ho Ryu

Recently, various studies have shown that meaningful knowledge can be discovered by applying data mining techniques in medical applications, i.e., decision support systems for disease diagnosis. However, there are still several computational challenges due to the high-dimensionality of medical data. Feature selection is an essential pre-processing procedure in data mining to identify relevant feature subset for classification. In this study, we proposed a hybrid feature selection mechanism by combining symmetrical uncertainty and Bayesian network. As a case study, we applied our proposed method to the hypertension diagnosis problem. The results showed that our method can improve the classification performance and outperformed existing feature selection techniques.


international conference on big data and smart computing | 2017

Detection of differentially expressed genes using feature selection approach from RNA-seq

Yongjun Piao; Keun Ho Ryu

With the advance of next generation sequencing technology, RNA-seq is widely being used for transcriptomics as an alternative for microarray. RNA-seq has a dynamic range of applications such as gene expression quantification, alternative splicing identification, and novel transcript discovery. Generally, the primary aim of RNA-seq analysis is to detect differentially expressed genes in different biological conditions. From the data mining point of view, discovering differentially expressed genes can be seen as a feature selection problem that identifies most significant genes for discriminating different biological conditions. Feature selection methods for differential analysis in microarray data are well established in the literature but there are few studies on feature selection in RNA-seq experiments. In this paper, we propose to apply feature selection method in data mining for differential expression analysis. Symmetrical uncertainty is used to rank the genes and significant genes are selected based on a pre-defined relevance threshold. To evaluate the proposed method, we conducted a simulation study to assess the performance in terms of the true and false positive rates. The experimental results demonstrated that feature selection strategy can be applied for differential analysis in RNA-seq and outperformed the existing statistical approaches.

Collaboration


Dive into the Yongjun Piao's collaboration.

Top Co-Authors

Avatar

Keun Ho Ryu

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Minghao Piao

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Meijing Li

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Ho Sun Shon

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Hyun Woo Park

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Cheng Hao Jin

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Feifei Li

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Buhyun Hwang

Chonnam National University

View shared research outputs
Top Co-Authors

Avatar

Dingkun Li

Chungbuk National University

View shared research outputs
Top Co-Authors

Avatar

Jang-Whan Bae

Chungbuk National University

View shared research outputs
Researchain Logo
Decentralizing Knowledge