Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jung Hun Oh is active.

Publication


Featured researches published by Jung Hun Oh.


Journal of Bioinformatics and Computational Biology | 2006

Proteomic biomarker identification for diagnosis of early relapse in ovarian cancer.

Jung Hun Oh; Animesh Nandi; Prem Gurnani; Lynne Knowles; John O. Schorge; Kevin P. Rosenblatt; Jean Gao

Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms.


BMC Bioinformatics | 2009

A kernel-based approach for detecting outliers of high-dimensional biological data

Jung Hun Oh; Jean Gao

BackgroundIn many cases biomedical data sets contain outliers that make it difficult to achieve reliable knowledge discovery. Data analysis without removing outliers could lead to wrong results and provide misleading information.ResultsWe propose a new outlier detection method based on Kullback-Leibler (KL) divergence. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. KL divergence is defined between two sample sets with and without the test sample. To handle the non-linearity of sample distribution, original data is mapped into a higher feature space. We address the singularity problem due to small sample size during KL divergence calculation. Kernel functions are applied to avoid direct use of mapping functions. The performance of the proposed method is demonstrated on a synthetic data set, two public microarray data sets, and a mass spectrometry data set for liver cancer study. Comparative studies with Mahalanobis distance based method and one-class support vector machine (SVM) are reported showing that the proposed method performs better in finding outliers.ConclusionOur idea was derived from Markov blanket algorithm that is a feature selection method based on KL divergence. That is, while Markov blanket algorithm removes redundant and irrelevant features, our proposed method detects outliers. Compared to other algorithms, our proposed method shows better or comparable performance for small sample and high-dimensional biological data. This indicates that the proposed method can be used to detect outliers in biological data sets.


international conference of the ieee engineering in medicine and biology society | 2009

An Extended Markov Blanket Approach to Proteomic Biomarker Detection From High-Resolution Mass Spectrometry Data

Jung Hun Oh; Prem Gurnani; John O. Schorge; Kevin P. Rosenblatt; Jean Gao

High-resolution matrix-assisted laser desorption/ionization time-of-flight mass spectrometry has recently shown promise as a screening tool for detecting discriminatory peptide/protein patterns. The major computational obstacle in finding such patterns is the large number of mass/charge peaks (features, biomarkers, data points) in a spectrum. To tackle this problem, we have developed methods for data preprocessing and biomarker selection. The preprocessing consists of binning, baseline correction, and normalization. An algorithm, extended Markov blanket, is developed for biomarker detection, which combines redundant feature removal and discriminant feature selection. The biomarker selection couples with support vector machine to achieve sample prediction from high-resolution proteomic profiles. Our algorithm is applied to recurrent ovarian cancer study that contains platinum-sensitive and platinum-resistant samples after treatment. Experiments show that the proposed method performs better than other feature selection algorithms. In particular, our algorithm yields good performance in terms of both sensitivity and specificity as compared to other methods.


Computer Methods and Programs in Biomedicine | 2009

Prostate cancer biomarker discovery using high performance mass spectral serum profiling

Jung Hun Oh; Yair Lotan; Prem Gurnani; Kevin P. Rosenblatt; Jean Gao

Prostate-specific antigen (PSA) is the most widely used serum biomarker for early detection of prostate cancer (PCA). Nevertheless, PSA level can be falsely elevated due to prostatic enlargement, inflammation or infection, which limits the PSA test specificity. The objective of this study is to use a machine learning approach for the analysis of mass spectrometry data to discover more reliable biomarkers that distinguish PCA from benign specimens. Serum samples from 179 prostate cancer patients and 74 benign patients were analyzed. These samples were processed using ProXPRESSION Biomarker Enrichment Kits (PerkinElmer). Mass spectra were acquired using a prOTOF 2000 matrix-assisted laser desorption/ionization orthogonal time-of-flight (MALDI-O-TOF) mass spectrometer. In this study, we search for potential biomarkers using our feature selection method, the Extended Markov Blanket (EMB). From the new marker selection algorithm, a panel of 26 peaks achieved an accuracy of 80.7%, a sensitivity of 83.5%, a specificity of 74.4%, a positive predictive value (PPV) of 87.9%, and a negative predictive value (NPV) of 68.2%. On the other hand, when PSA alone was used (with a cutoff of 4.0ng/ml), a sensitivity of 66.7%, a specificity of 53.6%, a PPV of 73.5%, and a NPV of 45.4% were obtained.


bioinformatics and bioengineering | 2006

Prediction of labor for pregnant women using high-resolution mass spectrometry data

Jung Hun Oh; Animesh Nandi; Prem Gurnani; Peter Bryant-Greenwood; Kevin P. Rosenblatt; Jean Gao

High-resolution MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry has shown promise as a screening tool for detecting discriminatory protein patterns. The major computational obstacle in analyzing MALDI-TOF data is a large number of mass/charge peaks (a.k.a. features, data points). With the number of data points easily going beyond one million for a single sample, efficient feature selection is critical for unequivocal protein pattern discovery. To tackle this problem, we have developed a multi-step strategy for data preprocessing and afterwards feature selection. The preprocessing is composed of binning, baseline correction, and normalization. For the preprocessed data, we propose a new feature subset selection method that is a hybrid filter/wrapper approach. Based on the two feature subsets for each feature, high and low correlated subsets, a feature is assigned a weight which indicates the extent of feature importance. Our scheme is applied to the analysis of labor dataset to predict delivery time of pregnant women. To validate the performance of the proposed algorithm, experiments are performed in comparison with other feature selection and classification methods. We show that our proposed approach outperforms other algorithms


bioinformatics and biomedicine | 2008

Biological Data Outlier Detection Based on Kullback-Leibler Divergence

Jung Hun Oh; Jean Gao; Kevin P. Rosenblatt

Outlier detection is imperative in biomedical data analysis to achieve reliable knowledge discovery. In this paper, a new outlier detection method based on Kullback-Leibler (KL) divergence is presented. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. To handle the non-linearity during the KL divergence calculation and to tackle with the singularity problem due to small sample size, we map the original data into a higher feature space and apply kernel functions without resorting to a mapping function. A sample possessing the largest KL divergence is detected as an outlier. The proposed method is tested with one synthetic data, two public gene expression data sets, and our own mass spectrometry data generated for prostate cancer study.


bioinformatics and bioengineering | 2007

Biomarker Selection for Predicting Alzheimer Disease Using High-Resolution MALDI-TOF Data

Jung Hun Oh; Young Bun Kim; Prem Gurnani; Kevin P. Rosenblatt; Jean Gao

High-resolution MALDI-TOF (matrix-assisted laser desorption/ionization time-of-flight) mass spectrometry has shown promise as a screening tool for detecting discriminatory peptide/protein patterns. The major computational obstacle in analyzing MALDI-TOF data is the large number of mass/charge peaks (a.k.a. features, data points). With such a huge number of data points for a single sample, efficient feature selection is critical for unequivocal protein pattern discovery. In this paper, we propose a feature selection method and a new biclassification algorithm based on error-correcting output coding (ECOC) in multiclass problems. Our scheme is applied to the analysis of alzheimers disease (AD) data. To validate the performance of the proposed algorithm, experiments are performed in comparison with other methods. We show that our proposed framework outperforms not only the standard ECOC framework but also other algorithms.


computational intelligence in bioinformatics and computational biology | 2005

Multicategory Classification using Extended SVM-RFE and Markov Blanket on SELDI-TOF Mass Spectrometry Data

Jung Hun Oh; Jean Gao; Animesh Nandi; Prem Gurnani; Lynne Knowles; John O. Schorge; Kevin P. Rosenblatt

Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry data has been increasingly analyzed for identifying biomarkers for disease to help early detection of the disease. Recently, support vector machine (SVM) algorithm based on recursive feature elimination (RFE) was proposed to find a set of genes for cancer classification. In our study, we extend the SVM-RFE such that it can be used in the multicategory classification work using SELDI-TOF mass spectrometry data and propose a new feature selection algorithm (SVM-MB/RFE : SVM-Markov Blanket/Recursive Feature Elimination). In the preprocessing task of SVM-MB/RFE, ANOVA (Analysis of Variance) and binning methods are used for feature filtering. We demonstrate that the performance is improved through the preprocessing work. Compared with other methods such as not only SVM-RFE and Markov blanket but also PCA (Principle Components Analysis)+LDA (Linear Discriminant Analysis) and other feature selection algorithms, SVM-MB/RFE performs better than them.


bioinformatics and bioengineering | 2005

Peptide identification by tandem mass spectra: an efficient parallel searching

Jung Hun Oh; Jean Gao

De novo peptide sequencing that determines the amino acid sequence of a peptide via tandem mass spectrometry (MS/MS) has been increasingly used nowadays in proteomics for protein identification. Current de novo methods generally employ a graph theory which usually produces a large number of candidate sequences and causes heavy computational cost while trying to determine a sequence with less ambiguity. We present a novel de novo sequencing algorithm which greatly reduces the number of candidate sequences. By utilizing certain properties of b- and y-ion series in MS/MS spectrum, we propose a reliable two-way parallel searching algorithm to filter out the peptide candidates which are further pruned by an intensity evidence based screening criterion. And we find an adjusted value required to determine the position of end node of b- and y-ion series for the charged +2 precursor in our graph. Results of our algorithm are compared with those of PEAKS, a well-known de novo sequencing software. Experimental results demonstrate the six sequences are identical with the correct sequences. And for the further pruning, rankings of our result remain unchanged even though the screening criterion changes. Therefore we can reduce the number of candidate sequences by adopting a proper screening criterion.


bioinformatics and biomedicine | 2007

A Novel Classification Method for Analysis of Multi-stage Diseases via Mass Spectrometric Data

Jung Hun Oh; Young Bun Kim; Jean Gao

Multi-category classification is one of the challenging issues in medical data analysis. We propose a new bi- classification algorithm for the multi-class classification, which is comprised of two schemes: error-correcting output coding (ECOC) and pairwise coupling (PWC). After fea- ture reduction in both schemes, each corresponding classi- fication strategy is performed. For a test sample, two class labels that are predicted in both schemes are compared. If two class labels are the same, we assign the test sample to an identical label; otherwise, only for samples belonging to different classes predicted from two schemes, a retraining method is employed. Our scheme is applied to the analysis of a MALDI-TOF data set which consists of hepatocellular carcinoma (HCC) patients, cirrhosis patients and healthy individuals. To validate the performance of our proposed algorithm, experiments were performed in comparison with other classification methods.

Collaboration


Dive into the Jung Hun Oh's collaboration.

Top Co-Authors

Avatar

Jean Gao

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar

Kevin P. Rosenblatt

University of Texas Health Science Center at Houston

View shared research outputs
Top Co-Authors

Avatar

Prem Gurnani

University of Texas Southwestern Medical Center

View shared research outputs
Top Co-Authors

Avatar

Animesh Nandi

University of Texas Southwestern Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lynne Knowles

University of Texas Southwestern Medical Center

View shared research outputs
Top Co-Authors

Avatar

Young Bun Kim

University of Texas at Arlington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yair Lotan

University of Texas Southwestern Medical Center

View shared research outputs
Researchain Logo
Decentralizing Knowledge