Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kyung-Ah Sohn is active.

Publication


Featured researches published by Kyung-Ah Sohn.


Bioinformatics | 2009

A multivariate regression approach to association analysis of a quantitative trait network

Seyoung Kim; Kyung-Ah Sohn; Eric P. Xing

Motivation: Many complex disease syndromes such as asthma consist of a large number of highly related, rather than independent, clinical phenotypes, raising a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. Although a causal genetic variation may influence a group of highly correlated traits jointly, most of the previous association analyses considered each phenotype separately, or combined results from a set of single-phenotype analyses. Results: We propose a new statistical framework called graph-guided fused lasso to address this issue in a principled way. Our approach represents the dependency structure among the quantitative traits explicitly as a network, and leverages this trait network to encode structured regularizations in a multivariate regression model over the genotypes and traits, so that the genetic markers that jointly influence subgroups of highly correlated traits can be detected with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently, our approach analyzes all of the traits jointly in a single statistical method to discover the genetic markers that perturb a subset of correlated triats jointly rather than a single trait. Using simulated datasets based on the HapMap consortium data and an asthma dataset, we compare the performance of our method with the single-marker analysis, and other sparse regression methods that do not use any structural information in the traits. Our results show that there is a significant advantage in detecting the true causal single nucleotide polymorphisms when we incorporate the correlation pattern in traits using our proposed methods. Availability: Software for GFlasso is available at http://www.sailing.cs.cmu.edu/gflasso.html Contact: [email protected]; [email protected];


international conference on machine learning | 2006

Bayesian multi-population haplotype inference via a hierarchical dirichlet process mixture

Eric P. Xing; Kyung-Ah Sohn; Michael I. Jordan; Yee Whye Teh

Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far---including methods based on coalescence, finite and infinite mixtures, and maximal parsimony---ignore the underlying population structure in the genotype data. As noted by Pritchard (2001), different populations can share certain portion of their genetic ancestors, as well as have their own genetic components through migration and diversification. In this paper, we address the problem of multi-population haplotype inference. We capture cross-population structure using a nonparametric Bayesian prior known as the hierarchical Dirichlet process (HDP) (Teh et al., 2006), conjoining this prior with a recently developed Bayesian methodology for haplotype phasing known as DP-Haplotyper (Xing et al., 2004). We also develop an efficient sampling algorithm for the HDP based on a two-level nested Pólya urn scheme. We show that our model outperforms extant algorithms on both simulated and real biological data.


Journal of the American Medical Informatics Association | 2014

Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction

Dokyoon Kim; Je-Gun Joung; Kyung-Ah Sohn; Hyunjung Shin; Yu Rang Park; Marylyn D. Ritchie; Ju Han Kim

Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies.


The Annals of Applied Statistics | 2009

A hierarchical Dirichlet process mixture model for haplotype reconstruction from multi-population data

Kyung-Ah Sohn; Eric P. Xing

The perennial problem of how many clusters? remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiple-cluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multi-population haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations. It offers a well-founded statistical framework for posterior inference of individual haplotypes, the size and configuration of haplotype ancestor pools, and other parameters of interest given genotype data.


pacific conference on computer graphics and applications | 2002

Computing distances between surfaces using line geometry

Kyung-Ah Sohn; Bert Jüttler; Myung Soo Kim; Wenping Wang

We present an algorithm for computing the distance between two free-form surfaces. Using line geometry, the distance computation is reformulated as a simple instance of a surface-surface intersection problem, which leads to low-dimensional root finding in a system of equations. This approach produces an efficient algorithm for computing the distance between two ellipsoids, where the problem is reduced to finding a specific solution in a system of two equations in two variables. Similar algorithms can be designed for computing the distance between an ellipsoid and a simple surface (such as cylinder cone, or torus). In an experimental implementation (on a 500 MHz Windows PC), the distance between two ellipsoids was computed in less than 0.3 msec on average; and the distance between an ellipsoid and a simple convex surface was computed in less than 0.15 msec on average.


intelligent systems in molecular biology | 2007

Spectrum: joint bayesian inference of population structure and recombination events

Kyung-Ah Sohn; Eric P. Xing

MOTIVATIONnWhile genetic properties such as linkage disequilibrium (LD) and population structure are closely related under a common inheritance process, the statistical methodologies developed so far mostly deal with LD analysis and structural inference separately, using specialized models that do not capture their statistical and genetic relationships. Also, most of these approaches ignore the inherent uncertainty in the genetic complexity of the data and rely on inflexible models built on a closed genetic space. These limitations may make it difficult to infer detailed and consistent structural information from rich genomic data such as populational single nucleotide polymorphisms (SNP) profiles.nnnRESULTSnWe propose a new model-based approach to address these issues through joint inference of population structure and recombination events under a non-parametric Bayesian framework; we present Spectrum, an efficient implementation based on our new model. We validated Spectrum on simulated data and applied it to two real SNP datasets, including single-population Daly data and the four-population HapMap data. Our method performs well relative to LDhat 2.0 in estimating the recombination rates and hotspots on these datasets. More interestingly, it generates an ancestral spectrum for representing population structures which not only displays sub-structure based on population founders but also reveals details of the genetic diversity of each individual. It offers an alternative view of the population structures to that offered by Structure 2.1, which ignores chromosome-level mutation and recombination with respect to founders.


Genetics | 2012

Robust Estimation of Local Genetic Ancestry in Admixed Populations Using a Nonparametric Bayesian Approach

Kyung-Ah Sohn; Zoubin Ghahramani; Eric P. Xing

We present a new haplotype-based approach for inferring local genetic ancestry of individuals in an admixed population. Most existing approaches for local ancestry estimation ignore the latent genetic relatedness between ancestral populations and treat them as independent. In this article, we exploit such information by building an inheritance model that describes both the ancestral populations and the admixed population jointly in a unified framework. Based on an assumption that the common hypothetical founder haplotypes give rise to both the ancestral and the admixed population haplotypes, we employ an infinite hidden Markov model to characterize each ancestral population and further extend it to generate the admixed population. Through an effective utilization of the population structural information under a principled nonparametric Bayesian framework, the resulting model is significantly less sensitive to the choice and the amount of training data for ancestral populations than state-of-the-art algorithms. We also improve the robustness under deviation from common modeling assumptions by incorporating population-specific scale parameters that allow variable recombination rates in different populations. Our method is applicable to an admixed population from an arbitrary number of ancestral populations and also performs competitively in terms of spurious ancestry proportions under a general multiway admixture assumption. We validate the proposed method by simulation under various admixing scenarios and present empirical analysis results from a worldwide-distributed dataset from the Human Genome Diversity Project.


Bayesian Analysis | 2007

Hidden Markov Dirichlet process: modeling genetic inference in open ancestral space

Eric P. Xing; Kyung-Ah Sohn

The problem of inferring the population structure, linkage disequilibrium pattern, and chromosomal recombination hotspots from genetic polymorphism data is essential for understanding the origin and characteristics of genome variations, with important applications to the genetic analysis of disease propensities and other complex traits. Statistical genetic methodologies developed so far mostly address these problems separately using specialized models ranging from coalescence and admixture models for population structures, to hidden Markov models and renewal processes for recombination; but most of these approaches ignore the inherent uncertainty in the genetic complexity (e.g., the number of genetic founders of a population) of the data and the close statistical and biological relationships among objects studied in these problems. We present a new statistical framework called hidden Markov Dirichlet process (HMDP) to jointly model the genetic recombinations among a possibly infinite number of founders and the coalescence-with-mutation events in the resulting genealogies. The HMDP posits that a haplotype of genetic markers is generated by a sequence of recombination events that select an ancestor for each locus from an unbounded set of founders according to a 1st-order Markov transition process. Conjoining this process with a mutation model, our method accommodates both between-lineage recombination and within-lineage sequence variations, and leads to a compact and natural interpretation of the population structure and inheritance process underlying haplotype data. We have developed an efficient sampling algorithm for HMDP based on a two-level nested Polya urn scheme, and we present experimental results on joint inference of population structure, linkage disequilibrium, and recombination hotspots based on HMDP. On both simulated and real SNP haplotype data, our method performs competitively or significantly better than extant methods in uncovering the recombination hotspots along chromosomal loci; and in addition it also infers the ancestral genetic patterns and offers a highly accurate map of ancestral compositions of modern populations.


BMC Genomics | 2012

MicroRNA-centric measurement improves functional enrichment analysis of co-expressed and differentially expressed microRNA clusters

Su Yeon Lee; Kyung-Ah Sohn; Ju Han Kim

BackgroundFunctional annotations are available only for a very small fraction of microRNAs (miRNAs) and very few miRNA target genes are experimentally validated. Therefore, functional analysis of miRNA clusters has typically relied on computational target gene prediction followed by Gene Ontology and/or pathway analysis. These previous methods share the limitation that they do not consider the many-to-many-to-many tri-partite network topology between miRNAs, target genes, and functional annotations. Moreover, the highly false-positive nature of sequence-based target prediction algorithms causes propagation of annotation errors throughout the tri-partite network.ResultsA new conceptual framework is proposed for functional analysis of miRNA clusters, which extends the conventional target gene-centric approaches to a more generalized tri-partite space. Under this framework, we construct miRNA-, target link-, and target gene-centric computational measures incorporating the whole tri-partite network topology. Each of these methods and all their possible combinations are evaluated on publicly available miRNA clusters and with a wide range of variations for miRNA-target gene relations. We find that the miRNA-centric measures outperform others in terms of the average specificity and functional homogeneity of the GO terms significantly enriched for each miRNA cluster.ConclusionsWe propose novel miRNA-centric functional enrichment measures in a conceptual framework that connects the spaces of miRNAs, genes, and GO terms in a unified way. Our comprehensive evaluation result demonstrates that functional enrichment analysis of co-expressed and differentially expressed miRNA clusters can substantially benefit from the proposed miRNA-centric approaches.


Eurasip Journal on Image and Video Processing | 2007

View influence analysis and optimization for multiview face recognition

Won-Sook Lee; Kyung-Ah Sohn

We present a novel method to recognize a multiview face (i.e., to recognize a face under different views) through optimization of multiple single-view face recognitions. Many current face descriptors show quite satisfactory results to recognize identity of people with given limited view (especially for the frontal view), but the full view of the human head has not yet been recognizable with commercially acceptable accuracy. As there are various single-view recognition techniques already developed for very high success rate, for instance, MPEG-7 advanced face recognizer, we propose a new paradigm to facilitate multiview face recognition, not through a multiview face recognizer, but through multiple single-view recognizers. To retrieve faces in any view from a registered descriptor, we need to give corresponding view information to the descriptor. As the descriptor needs to provide any requested view in 3D space, we refer to it as 3D information that it needs to contain. Our analysis in various angled views checks the extent of each view influence and it provides a way to recognize a face through optimized integration of single view descriptors covering the view plane of horizontal rotation from −90∘ to 90∘ and vertical rotation from −30∘ to 30∘. The resulting face descriptor based on multiple representative views, which is of compact size, shows reasonable face recognition performance on any view. Hence, our face descriptor contains quite enough 3D information of a persons face to help for recognition and eventually for search, retrieval, and browsing of photographs, videos, and 3D-facial model databases.

Collaboration


Dive into the Kyung-Ah Sohn's collaboration.

Top Co-Authors

Avatar

Eric P. Xing

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Seyoung Kim

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ju Han Kim

Chonnam National University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dokyoon Kim

Geisinger Health System

View shared research outputs
Top Co-Authors

Avatar

Marylyn D. Ritchie

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge