Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kwang H. Lee is active.

Publication


Featured researches published by Kwang H. Lee.


Nucleic Acids Research | 2006

PLPD: reliable protein localization prediction from imbalanced and overlapped datasets

Ki-Young Lee; Dae-Won Kim; Dokyun Na; Kwang H. Lee; Doheon Lee

Subcellular localization is one of the key functional characteristics of proteins. An automatic and efficient prediction method for the protein subcellular localization is highly required owing to the need for large-scale genome analysis. From a machine learning point of view, a dataset of protein localization has several characteristics: the dataset has too many classes (there are more than 10 localizations in a cell), it is a multi-label dataset (a protein may occur in several different subcellular locations), and it is too imbalanced (the number of proteins in each localization is remarkably different). Even though many previous works have been done for the prediction of protein subcellular localization, none of them tackles effectively these characteristics at the same time. Thus, a new computational method for protein localization is eventually needed for more reliable outcomes. To address the issue, we present a protein localization predictor based on D-SVDD (PLPD) for the prediction of protein localization, which can find the likelihood of a specific localization of a protein more easily and more correctly. Moreover, we introduce three measurements for the more precise evaluation of a protein localization predictor. As the results of various datasets which are made from the experiments of Huh et al. (2003), the proposed PLPD method represents a different approach that might play a complimentary role to the existing methods, such as Nearest Neighbor method and discriminate covariant method. Finally, after finding a good boundary for each localization using the 5184 classified proteins as training data, we predicted 138 proteins whose subcellular localizations could not be clearly observed by the experiments of Huh et al. (2003).


BMC Bioinformatics | 2011

Building the process-drug-side effect network to discover the relationship between biological Processes and side effects

Sejoon Lee; Kwang H. Lee; Min Song; Doheon Lee

BackgroundSide effects are unwanted responses to drug treatment and are important resources for human phenotype information. The recent development of a database on side effects, the side effect resource (SIDER), is a first step in documenting the relationship between drugs and their side effects. It is, however, insufficient to simply find the association of drugs with biological processes; that relationship is crucial because drugs that influence biological processes can have an impact on phenotype. Therefore, knowing which processes respond to drugs that influence the phenotype will enable more effective and systematic study of the effect of drugs on phenotype. To the best of our knowledge, the relationship between biological processes and side effects of drugs has not yet been systematically researched.MethodsWe propose 3 steps for systematically searching relationships between drugs and biological processes: enrichment scores (ES) calculations, t-score calculation, and threshold-based filtering. Subsequently, the side effect-related biological processes are found by merging the drug-biological process network and the drug-side effect network. Evaluation is conducted in 2 ways: first, by discerning the number of biological processes discovered by our method that co-occur with Gene Ontology (GO) terms in relation to effects extracted from PubMed records using a text-mining technique and second, determining whether there is improvement in performance by limiting response processes by drugs sharing the same side effect to frequent ones alone.ResultsThe multi-level network (the process-drug-side effect network) was built by merging the drug-biological process network and the drug-side effect network. We generated a network of 74 drugs-168 side effects-2209 biological process relation resources. The preliminary results showed that the process-drug-side effect network was able to find meaningful relationships between biological processes and side effects in an efficient manner.ConclusionsWe propose a novel process-drug-side effect network for discovering the relationship between biological processes and side effects. By exploring the relationship between drugs and phenotypes through a multi-level network, the mechanisms underlying the effect of specific drugs on the human body may be understood.


Proteins | 2006

Specificity of molecular interactions in transient protein-protein interaction interfaces

Kyu-il Cho; Ki-Young Lee; Kwang H. Lee; Dongsup Kim; Doheon Lee

In this study, we investigate what types of interactions are specific to their biological function, and what types of interactions are persistent regardless of their functional category in transient protein–protein heterocomplexes. This is the first approach to analyze protein–protein interfaces systematically at the molecular interaction level in the context of protein functions. We perform systematic analysis at the molecular interaction level using classification and feature subset selection technique prevalent in the field of pattern recognition. To represent the physicochemical properties of protein–protein interfaces, we design 18 molecular interaction types using canonical and noncanonical interactions. Then, we construct input vector using the frequency of each interaction type in protein–protein interface. We analyze the 131 interfaces of transient protein–protein heterocomplexes in PDB: 33 protease‐inhibitors, 52 antibody‐antigens, 46 signaling proteins including 4 cyclin dependent kinase and 26 G‐protein. Using kNN classification and feature subset selection technique, we show that there are specific interaction types based on their functional category, and such interaction types are conserved through the common binding mechanism, rather than through the sequence or structure conservation. The extracted interaction types are Cα H···OC interaction, cation···anion interaction, amine···amine interaction, and amine···cation interaction. With these four interaction types, we achieve the classification success rate up to 83.2% with leave‐one‐out cross‐validation at k = 15. Of these four interaction types, CαH···OC shows binding specificity for protease‐inhibitor complexes, while cation–anion interaction is predominant in signaling complexes. The amine ··· amine and amine···cation interaction give a minor contribution to the classification accuracy. When combined with these two interactions, they increase the accuracy by 3.8%. In the case of antibody–antigen complexes, the sign is somewhat ambiguous. From the evolutionary perspective, while protease‐inhibitors and sig‐naling proteins have optimized their interfaces to suit their biological functions, antibody–antigen interactions are the happenstance, implying that antibody–antigen complexes do not show distinctive interaction types. Persistent interaction types such as π···π, amide‐carbonyl, and hydroxyl‐carbonyl interaction, are also investigated. Analyzing the structural orientations of the π···π stacking interactions, we find that herringbone shape is a major configuration in transient protein–protein interfaces. This result is different from that of protein core, where parallel‐displaced configurations are the major configuration. We also analyze overall trend of amide‐carbonyl and hydroxyl‐carbonyl interactions. It is noticeable that nearly 82% of the interfaces have at least one hydroxyl‐carbonyl interactions. Proteins 2006.


computer-based medical systems | 2007

Inferring Gene Regulatory Networks from Microarray Time Series Data Using Transfer Entropy

Thai Quang Tung; Taewoo Ryu; Kwang H. Lee; Doheon Lee

Reverse engineering of gene regulatory networks from microarray time series data has been a challenging problem due to the limit of available data. In this paper, a new approach is proposed based on the concept of transfer entropy. Using this information theoretic measure, causal relations between pairs of genes are assessed to draw a causal network. A heuristic rule is then applied to differentiate direct and indirect causality. Simulation on a synthetic network showed that the transfer entropy can identify both linear and nonlinear causality. Application of the method in a biological data identified many causal interactions with biological information supports.


Bioinformatics | 2007

Towards clustering of incomplete microarray data without the use of imputation

Dae-Won Kim; Ki-Young Lee; Kwang H. Lee; Doheon Lee

MOTIVATION Clustering technique is used to find groups of genes that show similar expression patterns under multiple experimental conditions. Nonetheless, the results obtained by cluster analysis are influenced by the existence of missing values that commonly arise in microarray experiments. Because a clustering method requires a complete data matrix as an input, previous studies have estimated the missing values using an imputation method in the preprocessing step of clustering. However, a common limitation of these conventional approaches is that once the estimates of missing values are fixed in the preprocessing step, they are not changed during subsequent processes of clustering; badly estimated missing values obtained in data preprocessing are likely to deteriorate the quality and reliability of clustering results. Thus, a new clustering method is required for improving missing values during iterative clustering process. RESULTS We present a method for Clustering Incomplete data using Alternating Optimization (CIAO) in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster information such as cluster centroids and all available non-missing values in each iteration. To test the performance of the CIAO, we applied the CIAO and conventional imputation-based clustering methods, e.g. k-means based on KNNimpute, for clustering two yeast incomplete data sets, and compared the clustering result of each method using the Saccharomyces Genome Database annotations. The clustering results of the CIAO method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data. AVAILABILITY The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request.


data mining in bioinformatics | 2011

Predicting disease phenotypes based on the molecular networks with Condition-Responsive Correlation

Sejoon Lee; Eunjung Lee; Kwang H. Lee; Doheon Lee

Network-based methods using molecular interaction networks integrated with gene expression profiles have been proposed to solve problems, which arose from smaller number of samples compared with the large number of predictors. However, previous network-based methods, which have focused only on expression levels of proteins, nodes in the network through the identification of condition-responsive interactions. We propose a novel network-based classification, which focuses on both nodes with discriminative expression levels and edges with Condition-Responsive Correlations (CRCs) across two phenotypes. We found that modules with condition-responsive interactions provide candidate molecular models for diseases and show improved performances compared conventional gene-centric classification methods.


SCIS & ISIS SCIS & ISIS 2010 | 2010

WittyCG: A Web Based Platform for Integrated Cancer Gene Set Test

Yongdeuk Hwang; Kwang H. Lee; Doheon Lee

According to increasing of cancer-related researches, various cancer research subjects made a lot of candidate cancer relatedgene sets. However, currently there are few independent gene set test tools which provide independent gene set test. We provide a convenient and intuitivetool(WittyCG) for independent cancer gene set test.WittyCG(Witness to your Cancer Gene sets) supports independent gene set tests with various tissues and various experiments types. In addition WittyCGillustrates the result of gene set test graphically with statistical information.


international conference of the ieee engineering in medicine and biology society | 2007

Speed Estimation From a Tri-axial Accelerometer Using Neural Networks

Yoonseon Song; Seung Chul Shin; Seunghwan Kim; Doheon Lee; Kwang H. Lee


한국지능시스템학회 국제학술대회 발표논문집 | 2007

Fuzzy Association Rule Mining for Microarray Time Series Analysis

Inho Park; Doheon Lee; Kwang H. Lee


한국지능시스템학회 국제학술대회 발표논문집 | 2005

An Efficient Learning Method for Large Bayesian Networks using Clustering

Sungwon Jung; Kwang H. Lee; Doheon Lee

Collaboration


Dive into the Kwang H. Lee's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sejoon Lee

Samsung Medical Center

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hojung Nam

Gwangju Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge