Seungchan Kim
Arizona State University
                                 Network
                            
                            Latest external collaboration on country level. Dive into details by clicking on the dots.
                                 Publication
                            
                            Featured researches published by Seungchan Kim.
Bioinformatics | 2002
Ilya Shmulevich; Edward R. Dougherty; Seungchan Kim; Wei Zhang
MOTIVATIONnOur goal is to construct a model for genetic regulatory networks such that the model class: (i) incorporates rule-based dependencies between genes; (ii) allows the systematic study of global network dynamics; (iii) is able to cope with uncertainty, both in the data and the model selection; and (iv) permits the quantification of the relative influence and sensitivity of genes in their interactions with other genes.nnnRESULTSnWe introduce Probabilistic Boolean Networks (PBN) that share the appealing rule-based properties of Boolean networks, but are robust in the face of uncertainty. We show how the dynamics of these networks can be studied in the probabilistic context of Markov chains, with standard Boolean networks being special cases. Then, we discuss the relationship between PBNs and Bayesian networks--a family of graphical models that explicitly represent probabilistic relationships between variables. We show how probabilistic dependencies between a gene and its parent genes, constituting the basic building blocks of Bayesian networks, can be obtained from PBNs. Finally, we present methods for quantifying the influence of genes on other genes, within the context of PBNs. Examples illustrating the above concepts are presented throughout the paper.
Signal Processing | 2000
Edward R. Dougherty; Seungchan Kim; Yidong Chen
Abstract For statistical design of an optimal filter, it is probabilistically advantageous to employ a large number of observation random variables; however, estimation error increases with the number of variables, so that variables not contributing to the determination of the target variable can have a detrimental effect. In linear filtering, determination involves the correlation coefficients among the input and target variables. This paper discusses use of the more general coefficient of determination in nonlinear filtering. The determination coefficient is defined in accordance with the degree to which a filter estimates a target variable beyond the degree to which the target variable is estimated by its mean. Filter constraint decreases the coefficient, but it also decreases estimation error in filter design. Because situations in which the sample is relatively small in comparison with the number of observation variables are of salient interest, estimation of the determination coefficient is considered in detail. One may be unable to obtain a good estimate of an optimal filter, but can nonetheless use rough estimates of the coefficient to find useful sets of observation variables. Since minimal-error estimation underlies determination, this material is at the interface of signal processing, computational learning, and pattern recognition. Several signal-processing factors impact application: the signal model, morphological operator representation, and desirable operator properties. In particular, the paper addresses the VC dimension of increasing operators in terms of their morphological kernel/basis representations. Two applications are considered: window size for restoring degraded binary images; finding sets of genes that have significant predictive capability relative to target genes in genomic regulation.
Cancer Research | 2007
Wee J. Chng; Shaji Kumar; Scott VanWier; Greg J. Ahmann; Tammy Price-Troska; Kim Henderson; Tae Hoon Chung; Seungchan Kim; George Mulligan; Barbara M. Bryant; John D. Carpten; Morie A. Gertz; S. Vincent Rajkumar; Martha Q. Lacy; Angela Dispenzieri; Robert A. Kyle; Philip R. Greipp; P. Leif Bergsagel; Rafael Fonseca
Hyperdiploid multiple myeloma (H-MM) is the most common form of myeloma. In this gene expression profiling study, we show that H-MM is defined by a protein biosynthesis signature that is primarily driven by a gene dosage mechanism as a result of trisomic chromosomes. Within H-MM, four independently validated patient clusters overexpressing nonoverlapping sets of genes that form cognate pathways/networks that have potential biological importance in multiple myeloma were identified. One prominent cluster, cluster 1, is characterized by high expression of cancer testis antigen and proliferation-associated genes. Tumors from these patients were more proliferative than tumors in other clusters (median plasma cell labeling index, 3.8; P < 0.05). Another cluster, cluster 3, is characterized by genes involved in tumor necrosis factor/nuclear factor-kappaB signaling and antiapoptosis. These patients have better response to bortezomib as compared with patients within other clusters (70% versus 29%; P = 0.02). Furthermore, for a group of patients generally thought to have better prognosis, a cluster of patients with short survival (cluster 1; median survival, 27 months) could be identified. This analysis illustrates the heterogeneity within H-MM and the importance of defining specific cytogenetic prognostic factors. Furthermore, the signatures that defined these clusters may provide a basis for tailoring treatment to individual patients.
Journal of Computational Biology | 2002
Edward R. Dougherty; Junior Barrera; Marcel Brun; Seungchan Kim; Roberto M. Cesar; Yidong Chen; Michael L. Bittner; Jeffrey M. Trent
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Journal of Biological Systems | 2002
Seungchan Kim; Huai Li; Edward R. Dougherty; Nanwei Cao; Yidong Chen; Michael L. Bittner; Edward Suh
A fundamental question in biology is whether the network of interactions that regulate gene expression can be modeled by existing mathematical techniques. Studies of the ability to predict a genes...
Journal of Biomedical Optics | 2001
Seungchan Kim; Edward R. Dougherty
A cDNA microarray is a complex biochemical-optical system whose purpose is the simultaneous measurement of gene expression for thousands of genes. In this paper we propose a general statistical approach to finding associations between the expression patterns of genes via the coefficient of determination. This coefficient measures the degree to which the transcriptional levels of an observed gene set can be used to improve the prediction of the transcriptional state of a target gene relative to the best possible prediction in the absence of observations. The method allows incorporation of knowledge of other conditions relevant to the prediction, such as the application of particular stimuli or the presence of inactivating gene mutations, as predictive elements affecting the expression level of a given gene. Various aspects of the method are discussed: prediction quantification, unconstrained prediction, constrained prediction using ternary perceptrons, and design of predictors given small numbers of replicated microarrays. The method is applied to a set of genes undergoing genotoxic stress for validation according to the manner in which it points toward previously known and unknown relationships. The entire procedure is supported by software that can be applied to large gene sets, has a number of facilities to simplify data analysis, and provides graphics for visualizing experimental data, multiple gene interaction, and prediction logic.
International Journal of Cancer | 2011
Shilpi Arora; Aarati R. Ranade; Nhan L. Tran; Sara Nasser; Shravan Sridhar; Ronald L. Korn; Julianna T.D. Ross; Harshil Dhruv; Kristen M. Foss; Zita Sibenaller; Timothy C. Ryken; Michael B. Gotway; Seungchan Kim; Glen J. Weiss
Brain metastasis (BM) can affect ∼ 25% of nonsmall cell lung cancer (NSCLC) patients during their lifetime. Efforts to characterize patients that will develop BM have been disappointing. microRNAs (miRNAs) regulate the expression of target mRNAs. miRNAs play a role in regulating a variety of targets and, consequently, multiple pathways, which make them a powerful tool for early detection of disease, risk assessment, and prognosis. We investigated miRNAs that may serve as biomarkers to differentiate between NSCLC patients with and without BM. miRNA microarray profiling was performed on samples from clinically matched NSCLC from seven patients with BM (BM+) and six without BM (BM−). Using t‐test and further qRT‐PCR validation, eight miRNAs were confirmed to be significantly differentially expressed. Of these, expression of miR‐328 and miR‐330‐3p were able to correctly classify BM+ vs. BM− patients. This classifier was used on a validation cohort (n = 15), and it correctly classified 12/15 patients. Gene expression analysis comparing A549 parental and A549 cells stably transfected to over‐express miR‐328 (A549‐328) identified several significantly differentially expressed genes. PRKCA was one of the genes over‐expressed in A549‐328 cells. Additionally, A549‐328 cells had significantly increased cell migration compared to A549 cells, which was significantly reduced upon PRKCA knockdown. In summary, miR‐328 has a role in conferring migratory potential to NSCLC cells working in part through PRKCA and with further corroboration in additional independent cohorts, these miRNAs may be incorporated into clinical treatment decision making to stratify NSCLC patients at higher risk for developing BM.
Journal of Computational Biology | 2002
Seungchan Kim; Edward R. Dougherty; Junior Barrera; Yidong Chen; Michael L. Bittner; Jeffrey M. Trent
For small samples, classifier design algorithms typically suffer from overfitting. Given a set of features, a classifier must be designed and its error estimated. For small samples, an error estimator may be unbiased but, owing to a large variance, often give very optimistic estimates. This paper proposes mitigating the small-sample problem by designing classifiers from a probability distribution resulting from spreading the mass of the sample points to make classification more difficult, while maintaining sample geometry. The algorithm is parameterized by the variance of the spreading distribution. By increasing the spread, the algorithm finds gene sets whose classification accuracy remains strong relative to greater spreading of the sample. The error gives a measure of the strength of the feature set as a function of the spread. The algorithm yields feature sets that can distinguish the two classes, not only for the sample data, but for distributions spread beyond the sample data. For linear classifiers, the topic of the present paper, the classifiers are derived analytically from the model, thereby providing an enormous savings in computation time. The algorithm is applied to cancer classification via cDNA microarrays. In particular, the genes BRCA1 and BRCA2 are associated with a hereditary disposition to breast cancer, and the algorithm is used to find gene sets whose expressions can be used to classify BRCA1 and BRCA2 tumors.
Journal of Biomedical Informatics | 2009
Luis Tari; Chitta Baral; Seungchan Kim
We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.
Current Genomics | 2009
Edward R. Dougherty; Yufei Huang; Seungchan Kim; Xiaodong Cai; Rui Yamaguchi
Signal processing has played a major auxiliary role in medicine via the array of technologies available to physicians. Only a rapidly diminishing proportion of the population can recall medicine without computer tomography, magnetic resonance imaging, and ultrasound. In this capacity, signal processing serves only a supporting function. The future will be different. Like a factory, regulatory logic defines the cell as an operational system [1]: The roles of regulatory logic in the factory (or complex machine) and the cell are congruent because the key to the characterization of this logic lies in communication (between components) and control (of components)that is, in systems theory, which therefore determines the epistemology of the cell. Ipso facto, the mathematical foundations of biology, and therefore its translational partner, medicine, reside in the mathematics of systems theory. Hence, the roles of signal processing and the closely related theories of communication, control, and information will play constitutive functions as medicine evolves into a translational science resting on a theoretical framework.
