Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kyu-Baek Hwang is active.

Publication


Featured researches published by Kyu-Baek Hwang.


Bioinformatics | 2007

Discovery of microRNA–mRNA modules via population-based probabilistic learning

Je-Gun Joung; Kyu-Baek Hwang; Jin-Wu Nam; Soo Jin Kim; Byoung-Tak Zhang

MOTIVATION MicroRNAs (miRNAs) and mRNAs constitute an important part of gene regulatory networks, influencing diverse biological phenomena. Elucidating closely related miRNAs and mRNAs can be an essential first step towards the discovery of their combinatorial effects on different cellular states. Here, we propose a probabilistic learning method to identify synergistic miRNAs involving regulation of their condition-specific target genes (mRNAs) from multiple information sources, i.e. computationally predicted target genes of miRNAs and their respective expression profiles. RESULTS We used data sets consisting of miRNA-target gene binding information and expression profiles of miRNAs and mRNAs on human cancer samples. Our method allowed us to detect functionally correlated miRNA-mRNA modules involved in specific biological processes from multiple data sources by using a balanced fitness function and efficient searching over multiple populations. The proposed algorithm found two miRNA-mRNA modules, highly correlated with respect to their expression and biological function. Moreover, the mRNAs included in the same module showed much higher correlations when the related miRNAs were highly expressed, demonstrating our methods ability for finding coherent miRNA-mRNA modules. Most members of these modules have been reported to be closely related with cancer. Consequently, our method can provide a primary source of miRNA and target sets presumed to constitute closely related parts of gene regulatory pathways.


BMC Bioinformatics | 2004

Combining gene expression data from different generations of oligonucleotide arrays

Kyu-Baek Hwang; Sek Won Kong; Steven A. Greenberg; Peter J. Park

BackgroundOne of the important challenges in microarray analysis is to take full advantage of previously accumulated data, both from ones own laboratory and from public repositories. Through a comparative analysis on a variety of datasets, a more comprehensive view of the underlying mechanism or structure can be obtained. However, as we discover in this work, continual changes in genomic sequence annotations and probe design criteria make it difficult to compare gene expression data even from different generations of the same microarray platform.ResultsWe first describe the extent of discordance between the results derived from two generations of Affymetrix oligonucleotide arrays, as revealed in cluster analysis and in identification of differentially expressed genes. We then propose a method for increasing comparability. The dataset we use consists of a set of 14 human muscle biopsy samples from patients with inflammatory myopathies that were hybridized on both HG-U95Av2 and HG-U133A human arrays. We find that the use of the probe set matching table for comparative analysis provided by Affymetrix produces better results than matching by UniGene or LocusLink identifiers but still remains inadequate. Rescaling of expression values for each gene across samples and data filtering by expression values enhance comparability but only for few specific analyses. As a generic method for improving comparability, we select a subset of probes with overlapping sequence segments in the two array types and recalculate expression values based only on the selected probes. We show that this filtering of probes significantly improves the comparability while retaining a sufficient number of probe sets for further analysis.ConclusionsCompatibility between high-density oligonucleotide arrays is significantly affected by probe-level sequence information. With a careful filtering of the probes based on their sequence overlaps, data from different generations of microarrays can be combined more effectively.


Archive | 2002

Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis

Kyu-Baek Hwang; Dong-Yeon Cho; Sangwook Park; Sung-Dong Kim; Byoung-Tak Zhang

Classification of patient samples is a crucial aspect of cancer diagnosis. DNA hybridization arrays simultaneously measure the expression levels of thousands of genes and it has been suggested that gene expression may provide the additional information needed to improve cancer classification and diagnosis. This paper presents methods for analyzing gene expression data to classify cancer types. Machine learning techniques, such as Bayesian networks, neural trees, and radial basis function (RBF) networks, are used for the analysis of the CAMDA Data Set 2. These techniques have their own properties including the ability of finding important genes for cancer classification, revealing relationships among genes, and classifying cancer. This paper reports on comparative evaluation of the experimental results of these methods.


PLOS ONE | 2011

Systemic Analysis of Heat Shock Response Induced by Heat Shock and a Proteasome Inhibitor MG132

Hee-Jung Kim; Hye Joon Joo; Yung Hee Kim; Soyeon Ahn; Jun Chang; Kyu-Baek Hwang; Dong-Hee Lee; Kong-Joo Lee

The molecular basis of heat shock response (HSR), a cellular defense mechanism against various stresses, is not well understood. In this, the first comprehensive analysis of gene expression changes in response to heat shock and MG132 (a proteasome inhibitor), both of which are known to induce heat shock proteins (Hsps), we compared the responses of normal mouse fibrosarcoma cell line, RIF- 1, and its thermotolerant variant cell line, TR-RIF-1 (TR), to the two stresses. The cellular responses we examined included Hsp expressions, cell viability, total protein synthesis patterns, and accumulation of poly-ubiquitinated proteins. We also compared the mRNA expression profiles and kinetics, in the two cell lines exposed to the two stresses, using microarray analysis. In contrast to RIF-1 cells, TR cells resist heat shock caused changes in cell viability and whole-cell protein synthesis. The patterns of total cellular protein synthesis and accumulation of poly-ubiquitinated proteins in the two cell lines were distinct, depending on the stress and the cell line. Microarray analysis revealed that the gene expression pattern of TR cells was faster and more transient than that of RIF-1 cells, in response to heat shock, while both RIF-1 and TR cells showed similar kinetics of mRNA expression in response to MG132. We also found that 2,208 genes were up-regulated more than 2 fold and could sort them into three groups: 1) genes regulated by both heat shock and MG132, (e.g. chaperones); 2) those regulated only by heat shock (e.g. DNA binding proteins including histones); and 3) those regulated only by MG132 (e.g. innate immunity and defense related molecules). This study shows that heat shock and MG132 share some aspects of HSR signaling pathway, at the same time, inducing distinct stress response signaling pathways, triggered by distinct abnormal proteins.


pacific rim international conference on artificial intelligence | 2002

Construction of Large-Scale Bayesian Networks by Local to Global Search

Kyu-Baek Hwang; Jae Won Lee; Seung-Woo Chung; Byoung-Tak Zhang

Most existing algorithms for structural learning of Bayesian networks are suitable for constructing small-sized networks which consist of several tens of nodes. In this paper, we present a novel approach to the efficient and relatively-precise induction of large-scale Bayesian networks with up to several hundreds of nodes. The approach is based on the concept of Markov blanket and makes use of the divide-and-conquer principle. The proposed method has been evaluated on two benchmark datasets and a real-life DNA microarray data, demonstrating the ability to learn the large-scale Bayesian network structure efficiently.


Bioinformatics | 2005

CrossChip: a system supporting comparative analysis of different generations of Affymetrix arrays

Sek Won Kong; Kyu-Baek Hwang; Richard D. Kim; Byoung-Tak Zhang; Steven A. Greenberg; Isaac S. Kohane; Peter J. Park

SUMMARY To increase compatibility between different generations of Affymetrix GeneChip arrays, we propose a method of filtering probes based on their sequences. Our method is implemented as a web-based service for downloading necessary materials for converting the raw data files (*.CEL) for comparative analysis. The user can specify the appropriate level of filtering by setting the criteria for the minimum overlap length between probe sequences and the minimum number of usable probe pairs per probe set. Our website supports a within-species comparison for human and mouse GeneChip arrays. AVAILABILITY http://www.crosschip.org


Liver International | 2015

I148M variant in PNPLA3 reduces central adiposity and metabolic disease risks while increasing nonalcoholic fatty liver disease.

Jin-Ho Park; Belong Cho; Hyuktae Kwon; Daria Prilutsky; Jae Moon Yun; Ho Chun Choi; Kyu-Baek Hwang; In-Hee Lee; Jong-Il Kim; Sek Won Kong

The I148M variant because of the substitution of C to G in PNPLA3 (rs738409) is associated with the increased risk of nonalcoholic fatty liver disease (NAFLD). In liver, I148M variant reduces hydrolytic function of PNPLA3, which results in hepatic steatosis; however, its association with the other clinical phenotype such as adiposity and metabolic diseases is not well established.


Human Mutation | 2014

Prioritizing Disease-Linked Variants, Genes, and Pathways with an Interactive Whole-Genome Analysis Pipeline

In Hee Lee; Kyungjoon Lee; Michael Hsing; Yongjoon Choe; Jin Ho Park; Shu Hee Kim; Justin M. Bohn; Matthew B. Neu; Kyu-Baek Hwang; Robert C. Green; Isaac S. Kohane; Sek Won Kong

Whole‐genome sequencing (WGS) studies are uncovering disease‐associated variants in both rare and nonrare diseases. Utilizing the next‐generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline—gNOME—to prioritize phenotype‐associated variants while minimizing false‐positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype‐associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole‐exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOMEs accuracy of variant annotation and the enrichment of loss‐of‐function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).


systems man and cybernetics | 2005

Bayesian model averaging of Bayesian network classifiers over multiple node-orders: application to sparse datasets

Kyu-Baek Hwang; Byoung-Tak Zhang

Bayesian model averaging (BMA) can resolve the overfitting problem by explicitly incorporating the model uncertainty into the analysis procedure. Hence, it can be used to improve the generalization performance of Bayesian network classifiers. Until now, BMA of Bayesian network classifiers has only been performed in some restricted forms, e.g., the model is averaged given a single node-order, because of its heavy computational burden. However, it can be hard to obtain a good node-order when the available training dataset is sparse. To alleviate this problem, we propose BMA of Bayesian network classifiers over several distinct node-orders obtained using the Markov chain Monte Carlo sampling technique. The proposed method was examined using two synthetic problems and four real-life datasets. First, we show that the proposed method is especially effective when the given dataset is very sparse. The classification accuracy of averaging over multiple node-orders was higher in most cases than that achieved using a single node-order in our experiments. We also present experimental results for test datasets with unobserved variables, where the quality of the averaged node-order is more important. Through these experiments, we show that the difference in classification performance between the cases of multiple node-orders and single node-order is related to the level of noise, confirming the relative benefit of averaging over multiple node-orders for incomplete data. We conclude that BMA of Bayesian network classifiers over multiple node-orders has an apparent advantage when the given dataset is sparse and noisy, despite the methods heavy computational cost.


Archive | 2002

Analysis of Gene Expression Profiles and Drug Activity Patterns by Clustering and Bayesian Network Learning

Jeong Ho Chang; Kyu-Baek Hwang; Byoung-Tak Zhang

High-throughput genomic analysis provides insight into a complicated biological phenomena. However, the vast amount of data produced from upto-date biological experimental processes needs appropriate data mining techniques to extract useful information. In this paper, we propose a method based on cluster analysis and Bayesian network learning for the molecular pharmacology of cancer. Specifically, the NCI60 dataset is analyzed by soft topographic vector quantization (STVQ) for cluster analysis and by Bayesian network learning for dependency analysis. Our results of the cluster analysis show that gene expression profiles are more related to the kind of cancer than to drug activity patterns. Dependency analysis using Bayesian networks reveals some biologically meaningful relationships among gene expression levels, drug activities, and cancer types, suggesting the usefulness of Bayesian network learning as a method for exploratory analysis of high-throughput genomic data.

Collaboration


Dive into the Kyu-Baek Hwang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sek Won Kong

Boston Children's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

In-Hee Lee

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Jeong Ho Chang

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Jin-Wu Nam

Seoul National University

View shared research outputs
Top Co-Authors

Avatar

Seong-Bae Park

Kyungpook National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge