SungHwan Kim
University of Pittsburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by SungHwan Kim.
Science Translational Medicine | 2013
Jose D. Herazo-Maya; Imre Noth; Steven R. Duncan; SungHwan Kim; Shwu Fan Ma; George C. Tseng; Eleanor Feingold; Brenda Juan-Guardela; Thomas J. Richards; Yves A. Lussier; Yong Huang; Rekha Vij; Kathleen O. Lindell; Jianmin Xue; Kevin F. Gibson; Steven D. Shapiro; Joe G. N. Garcia; Naftali Kaminski
Genome-scale transcriptomic profiling of peripheral blood mononuclear cells from patients with idiopathic pulmonary fibrosis reveals that decreased expression of CD28, ICOS, LCK, and ITK predicts mortality. Gene Signature Predicts Mortality Idiopathic pulmonary fibrosis (IPF) is a fatal disease that progresses at different rates. Although no therapies exist, giving patients a more accurate prognosis is highly desirable. To this end, Herazo-Maya and colleagues searched the genomes of cells circulating in the blood of IPF patients and found that four genes may be indicators of poor outcome. Patients were recruited into discovery or replication cohorts from two different medical centers in the United States and followed until death or completion of the study. In both groups, genetic material was isolated from the patients’ peripheral blood mononuclear cells (PBMCs) and analyzed for increased or decreased expression. These gene expression profiles were then correlated with transplant-free survival (TFS). In the discovery cohort, Herazo-Maya et al. found that underexpression of the genes CD28, ICOS, LCK, and ITK was associated with decreased TFS. These findings were confirmed in the replication cohort. This “genomic model” incorporating the four genes was combined with the clinical outputs age, gender, and forced vital capacity to create an even stronger predictor of poor outcome. The authors suggest that the decreased expression of these genes might be linked to lower percentages of CD4+CD28+ T cells in the PBMC population, which could contribute to a mechanistic understanding of why some IPF patients progress differently than others. The findings of this study have the potential to affect the care of patients with IPF as well as the understanding of disease mechanism. However, the combined genomic and clinical predictor will need to be validated in additional independent cohorts before translation. We aimed to identify peripheral blood mononuclear cell (PBMC) gene expression profiles predictive of poor outcomes in idiopathic pulmonary fibrosis (IPF) by performing microarray experiments of PBMCs in discovery and replication cohorts of IPF patients. Microarray analyses identified 52 genes associated with transplant-free survival (TFS) in the discovery cohort. Clustering the microarray samples of the replication cohort using the 52-gene outcome-predictive signature distinguished two patient groups with significant differences in TFS. We studied the pathways associated with TFS in each independent microarray cohort and identified decreased expression of “The costimulatory signal during T cell activation” Biocarta pathway and, in particular, the genes CD28, ICOS, LCK, and ITK, results confirmed by quantitative reverse transcription polymerase chain reaction (qRT-PCR). A proportional hazards model, including the qRT-PCR expression of CD28, ICOS, LCK, and ITK along with patient’s age, gender, and percent predicted forced vital capacity (FVC%), demonstrated an area under the receiver operating characteristic curve of 78.5% at 2.4 months for death and lung transplant prediction in the replication cohort. To evaluate the potential cellular source of CD28, ICOS, LCK, and ITK expression, we analyzed and found significant correlation of these genes with the PBMC percentage of CD4+CD28+ T cells in the replication cohort. Our results suggest that CD28, ICOS, LCK, and ITK are potential outcome biomarkers in IPF and should be further evaluated for patient prioritization for lung transplantation and stratification in drug studies.
Nucleic Acids Research | 2016
Silvia Liu; Wei-Hsiang Tsai; Ying Ding; Rui Chen; Zhou Fang; Zhiguang Huo; SungHwan Kim; Tianzhou Ma; Ting-Yu Chang; Nolan Priedigkeit; Adrian V. Lee; Jian-Hua Luo; Hsei-Wei Wang; I-Fang Chung; George C. Tseng
Background: Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNA-seq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. Results: Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. Conclusion: Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates.
BMC Genomics | 2015
SungHwan Kim; Jose D. Herazo-Maya; Dongwan D. Kang; Brenda Juan-Guardela; John Tedrow; Fernando J. Martinez; Frank C. Sciurba; George C. Tseng; Naftali Kaminski
BackgroundThe increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. Unlike continuous intensity measurements from most omics data sets, phenome data contain clinical variables that are binary, ordinal and categorical.ResultsIn this paper we introduce an integrative phenotyping framework (iPF) for disease subtype discovery. A feature topology plot was developed for effective dimension reduction and visualization of multi-omics data. The approach is free of model assumption and robust to data noises or missingness. We developed a workflow to integrate homogeneous patient clustering from different omics data in an agglomerative manner and then visualized heterogeneous clustering of pairwise omics sources. We applied the framework to two batches of lung samples obtained from patients diagnosed with chronic obstructive lung disease (COPD) or interstitial lung disease (ILD) with well-characterized clinical (phenomic) data, mRNA and microRNA expression profiles. Application of iPF to the first training batch identified clusters of patients consisting of homogenous disease phenotypes as well as clusters with intermediate disease characteristics. Analysis of the second batch revealed a similar data structure, confirming the presence of intermediate clusters. Genes in the intermediate clusters were enriched with inflammatory and immune functional annotations, suggesting that they represent mechanistically distinct disease subphenotypes that may response to immunomodulatory therapies. The iPF software package and all source codes are publicly available.ConclusionsIdentification of subclusters with distinct clinical and biomolecular characteristics suggests that integration of phenomic and other omics information could lead to identification of novel mechanism-based disease sub-phenotypes.
Clinical Cancer Research | 2014
Swati Suryawanshi; Xin Huang; Esther Elishaev; Raluca Budiu; Lixin Zhang; SungHwan Kim; Nicole Donnellan; Gina Mantia-Smaldone; Tianzhou Ma; George C. Tseng; T. Lee; Suketu Mansuria; Robert P. Edwards; Anda M. Vlad
Purpose: Mechanisms of immune dysregulation associated with advanced tumors are relatively well understood. Much less is known about the role of immune effectors against cancer precursor lesions. Endometrioid and clear-cell ovarian tumors partly derive from endometriosis, a commonly diagnosed chronic inflammatory disease. We performed here a comprehensive immune gene expression analysis of pelvic inflammation in endometriosis and endometriosis-associated ovarian cancer (EAOC). Experimental Design: RNA was extracted from 120 paraffin tissue blocks comprising of normal endometrium (n = 32), benign endometriosis (n = 30), atypical endometriosis (n = 15), and EAOC (n = 43). Serous tumors (n = 15) were included as nonendometriosis-associated controls. The immune microenvironment was profiled using Nanostring and the nCounter GX Human Immunology Kit, comprising probes for a total of 511 immune genes. Results: One third of the patients with endometriosis revealed a tumor-like inflammation profile, suggesting that cancer-like immune signatures may develop earlier, in patients classified as clinically benign. Gene expression analyses revealed the complement pathway as most prominently involved in both endometriosis and EAOC. Complement proteins are abundantly present in epithelial cells in both benign and malignant lesions. Mechanistic studies in ovarian surface epithelial cells from mice with conditional (Cre-loxP) mutations show intrinsic production of complement in epithelia and demonstrate an early link between Kras- and Pten-driven pathways and complement upregulation. Downregulation of complement in these cells interferes with cell proliferation. Conclusions: These findings reveal new characteristics of inflammation in precursor lesions and point to previously unknown roles of complement in endometriosis and EAOC. Clin Cancer Res; 20(23); 6163–74. ©2014 AACR.
Biostatistics | 2017
SungHwan Kim; Steffi Oesterreich; Seyoung Kim; Yongseok Park; George C. Tseng
With the rapid advances in technologies of microarray and massively parallel sequencing, data of multiple omics sources from a large patient cohort are now frequently seen in many consortium studies. Effective multi-level omics data integration has brought new statistical challenges. One important biological objective of such integrative analysis is to cluster patients in order to identify clinically relevant disease subtypes, which will form basis for tailored treatment and personalized medicine. Several methods have been proposed in the literature for this purpose, including the popular iCluster method used in many cancer applications. When clustering high-dimensional omics data, effective feature selection is critical for better clustering accuracy and biological interpretation. It is also common that a portion of scattered samples has patterns distinct from all major clusters and should not be assigned into any cluster as they may represent a rare disease subcategory or be in transition between disease subtypes. In this paper, we firstly propose to improve feature selection of the iCluster factor model by an overlapping sparse group lasso penalty on the omics features using prior knowledge of inter-omics regulatory flows. We then perform regularization over samples to allow clustering with scattered samples and generate tight clusters. The proposed group structured tight iCluster method will be evaluated by two real breast cancer examples and simulations to demonstrate its improved clustering accuracy, biological interpretation, and ability to generate coherent tight clusters.
Bioinformatics | 2016
SungHwan Kim; Chien-Wei Lin; George C. Tseng
MOTIVATIONnSupervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies.nnnRESULTSnWe proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.nnnAVAILABILITY AND IMPLEMENTATIONnAn R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm)[email protected] INFORMATIONnSupplementary data are available at Bioinformatics online.
Bioinformatics | 2018
SungHwan Kim; Dongwan D. Kang; Zhiguang Huo; Yongseok Park; George C. Tseng
Motivation With the prevalent usage of microarray and massively parallel sequencing, numerous high‐throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high‐dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization. Results In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta‐analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta‐analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan‐cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework. Availability and implementation An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm). Supplementary information Supplementary data are available at Bioinformatics online.
Bioinformatics | 2018
Tianzhou Ma; Zhiguang Huo; Anche Kuo; Li Zhu; Zhou Fang; Xiangrui Zeng; Chien-Wei Lin; Silvia Liu; Lin Wang; Peng Liu; Tanbin Rahman; Lun-Ching Chang; SungHwan Kim; Jia Li; Yongseok Park; Chi Song; Steffi Oesterreich; Etienne Sibille; George C. Tseng
SUMMARYnThe rapid advances of omics technologies have generated abundant genomic data in public repositories and effective analytical approaches are critical to fully decipher biological knowledge inside these data. Meta-analysis combines multiple studies of a related hypothesis to improve statistical power, accuracy and reproducibility beyond individual study analysis. To date, many transcriptomic meta-analysis methods have been developed, yet few thoughtful guidelines exist. Here, we introduce a comprehensive analytical pipeline and browser-based software suite, called MetaOmics, to meta-analyze multiple transcriptomic studies for various biological purposes, including quality control, differential expression analysis, pathway enrichment analysis, differential co-expression network analysis, prediction, clustering and dimension reduction. The pipeline includes many public as well as >10 in-house transcriptomic meta-analytic methods with data-driven and biological-aim-driven strategies, hands-on protocols, an intuitive user interface and step-by-step instructions.nnnAVAILABILITY AND IMPLEMENTATIONnMetaOmics is freely available at https://github.com/metaOmics/metaOmics.nnnSUPPLEMENTARY INFORMATIONnSupplementary data are available at Bioinformatics online.
Cancer Research | 2014
Swati Suryawanshi; Xin Huang; Raluca Budiu; SungHwan Kim; George C. Tseng; Esther Elishaev; Marcia Klein-Patel; T. Lee; Suketu Mansuria; Robert P. Edwards; Anda M. Vlad
Proceedings: AACR Annual Meeting 2014; April 5-9, 2014; San Diego, CAnnIntroduction: Endometriosis is a largely benign, chronic inflammatory disease defined by the presence of endometrial-like glands surrounded by stroma. Epidemiologic studies suggest that endometriosis is an independent risk factor for endometrioid and clear cell epithelial ovarian tumors, collectively called endometriosis-associated ovarian cancers (EAOC). Histopathology findings demonstrate that EAOC occur in the presence of atypical endometriosis (AE), often found in direct continuity with the tumor, suggesting AE as the transitioning entity from benign lesions to malignant variants. Although it is widely accepted that chronic inflammation drives cancer, immune deregulation in chronic precursor lesions to ovarian cancer have not been studied in a systematic way.nnExperimental procedure: We extracted RNA from 135 paraffin tissue blocks comprising of normal endometrium (n=32), benign endometriosis consisting of ovarian and extra ovarian endometriosis cases (n=30,) atypical endometriosis (n=15) and EAOC, (n=43). Serous tumors (n=15) were included as non-endometriosis associated controls. Using Nanostring and the nCounter® GX Human Immunology Kit, we profiled a total of 511 immune genes. Cluster analyses and differential expressions (DE) were calculated using EdgeR. Protein expression of candidate proteins was validated with immunohistochemistry (IHC). To identify mechanisms of complement activation during early stages of genomic instability, we used murine cell lines derived from mice with conditional mutations in Kras and Pten pathways.nnResults: Nanostring immune gene expression profile eveal the predominant role of adaptive immunity and of the adaptive-innate immune cross-talk. The complement pathway was most prominently expressed, suggesting its roles as one of the major immune pathways involved in the transition from chronic endometriosis to atypical (premalignant) endometriosis and to EAOC. Complement pathway genes were upregulated in human endometriosis and atypical endometriosis. Tissue deposition of several complement components and changes in peripheral blood were confirmed by IHC and ELISA, respectively. Genomic instability in murine ovarian cancer cell lines with conditional mutations resulted in upregulation of complement genes expression in epithelial cells, but without an advantage on cell death. These findings further support a paradigm shift on complement roles in cancer, suggesting its pro-tumorigenic roles.nnConclusions: We performed the first comprehensive gene profile of the tissue immune microenvironment in benign endometriosis, EAOC and precursor lesions and identified the previously unrecognized roles for the complement pathway. Complement activation may be an early trigger of inflammation and an early link between epithelium and immune environment.nnThese findings have high translational potential for immune therapy and prevention in EAOC.nnCitation Format: Swati Maruti Suryawanshi, Xin Huang, Raluca Budiu, SungHwan Kim, George Tseng, Esther Elishaev, Marcia Klein-Patel, Ted Lee, Suketu Mansuria, Robert Edwards, Anda Vlad. Complement roles in endometriosis and endometriosis-associated ovarian cancer. [abstract]. In: Proceedings of the 105th Annual Meeting of the American Association for Cancer Research; 2014 Apr 5-9; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2014;74(19 Suppl):Abstract nr 1653. doi:10.1158/1538-7445.AM2014-1653
Archive | 2015
SungHwan Kim; Zhiguang Huo; Yongseok Park; George C. Tseng; Debashis Ghosh; Xianghong Jasmine Zhou