Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eunseog Youn is active.

Publication


Featured researches published by Eunseog Youn.


Anaerobe | 2010

Pyrosequencing study of fecal microflora of autistic and control children

Sydney M. Finegold; Scot E. Dowd; Viktoria Gontcharova; Chengxu Liu; Kathleen E. Henley; Randall D. Wolcott; Eunseog Youn; Paula H. Summanen; Doreen Granpeesheh; Dennis R. Dixon; Minghsun Liu; Denise Molitoris; John A. Green

There is evidence of genetic predisposition to autism, but the percent of autistic subjects with this background is unknown. It is clear that other factors, such as environmental influences, may play a role in this disease. In the present study, we have examined the fecal microbial flora of 33 subjects with various severities of autism with gastrointestinal symptoms, 7 siblings not showing autistic symptoms (sibling controls) and eight non-sibling control subjects, using the bacterial tag encoded FLX amplicon pyrosequencing (bTEFAP) procedure. The results provide us with information on the microflora of stools of young children and a compelling picture of unique fecal microflora of children with autism with gastrointestinal symptomatology. Differences based upon maximum observed and maximum predicted operational taxonomic units were statistically significant when comparing autistic and control subjects with p-values ranging from <0.001 to 0.009 using both parametric and non-parametric estimators. At the phylum level, Bacteroidetes and Firmicutes showed the most difference between groups of varying severities of autism. Bacteroidetes was found at high levels in the severely autistic group, while Firmicutes were more predominant in the control group. Smaller, but significant, differences also occurred in the Actinobacterium and Proteobacterium phyla. Desulfovibrio species and Bacteroides vulgatus are present in significantly higher numbers in stools of severely autistic children than in controls. If the unique microbial flora is found to be a causative or consequent factor in this type of autism, it may have implications with regard to a specific diagnostic test, its epidemiology, and for treatment and prevention.


The Open Microbiology Journal | 2010

A Comparison of Bacterial Composition in Diabetic Ulcers and Contralateral Intact Skin

Viktoria Gontcharova; Eunseog Youn; Yan Sun; Randall D. Wolcott; Scot E. Dowd

An extensive portion of the healthcare budget is allocated to chronic human infection. Chronic wounds in particular are a major contributor to this financial burden. Little is known about the types of bacteria which may contribute to the chronicity, biofilm and overall bioburden of the wound itself. In this study we compare the bacteriology of wounds and associated intact skin. Wound and paired intact skin swabs (from a contralateral location) were collected. The bacterial diversity was determined using bacterial Tag-encoded FLX amplicon pyrosequencing (bTEFAP). Diversity analysis showed intact skin to be significantly more diverse than wounds on both the species and genus levels (3% and 5% divergence). Furthermore, wounds show heightened levels of anaerobic bacteria, like Peptoniphilus, Finegoldia, and Anaerococcus, and other detrimental genera such as Corynebacterium and Staphylococcus. Although some of these and other bacterial genera were found to be common between intact skin and wounds, notable opportunistic wound pathogens were found at lower levels in intact skin. Principal Component Analysis demonstrated a clear separability of the two groups. The findings of the study not only greatly support the hypothesis of differing bacterial composition of intact skin and wounds, but also contribute additional insight into the ecology of skin and wound microflora. The increased diversity and lowered levels of opportunistic pathogens found in skin make the system highly distinguishable from wounds.


Protein Science | 2006

Evaluation of features for catalytic residue prediction in novel folds

Eunseog Youn; Brandon Peters; Predrag Radivojac; Sean D. Mooney

Structural genomics projects are determining the three‐dimensional structure of proteins without full characterization of their function. A critical part of the annotation process involves appropriate knowledge representation and prediction of functionally important residue environments. We have developed a method to extract features from sequence, sequence alignments, three‐dimensional structure, and structural environment conservation, and used support vector machines to annotate homologous and nonhomologous residue positions based on a specific training set of residue functions. In order to evaluate this pipeline for automated protein annotation, we applied it to the challenging problem of prediction of catalytic residues in enzymes. We also ranked the features based on their ability to discriminate catalytic from noncatalytic residues. When applying our method to a well‐annotated set of protein structures, we found that top‐ranked features were a measure of sequence conservation, a measure of structural conservation, a degree of uniqueness of a residues structural environment, solvent accessibility, and residue hydrophobicity. We also found that features based on structural conservation were complementary to those based on sequence conservation and that they were capable of increasing predictor performance. Using a family nonredundant version of the ASTRAL 40 v1.65 data set, we estimated that the true catalytic residues were correctly predicted in 57.0% of the cases, with a precision of 18.5%. When testing on proteins containing novel folds not used in training, the best features were highly correlated with the training on families, thus validating the approach to nonhomologous catalytic residue prediction in general. We then applied the method to 2781 coordinate files from the structural genomics target pipeline and identified both highly ranked and highly clustered groups of predicted catalytic residues.


Pattern Recognition Letters | 2009

Class dependent feature scaling method using naive Bayes classifier for text datamining

Eunseog Youn; Myong K. Jeong

The problem of feature selection is to find a subset of features for optimal classification. A critical part of feature selection is to rank features according to their importance for classification. The naive Bayes classifier has been extensively used in text categorization. We have developed a new feature scaling method, called class-dependent-feature-weighting (CDFW) using naive Bayes (NB) classifier. A new feature scaling method, CDFW-NB-RFE, combines CDFW and recursive feature elimination (RFE). Our experimental results showed that CDFW-NB-RFE outperformed other popular feature ranking schemes used on text datasets.


BMC Genomics | 2008

Double feature selection and cluster analyses in mining of microarray data from cotton

Magdy S Alabady; Eunseog Youn; Thea A. Wilkins

BackgroundCotton fiber is a single-celled seed trichome of major biological and economic importance. In recent years, genomic approaches such as microarray-based expression profiling were used to study fiber growth and development to understand the developmental mechanisms of fiber at the molecular level. The vast volume of microarray expression data generated requires a sophisticated means of data mining in order to extract novel information that addresses fundamental questions of biological interest. One of the ways to approach microarray data mining is to increase the number of dimensions/levels to the analysis, such as comparing independent studies from different genotypes. However, adding dimensions also creates a challenge in finding novel ways for analyzing multi-dimensional microarray data.ResultsMining of independent microarray studies from Pima and Upland (TM1) cotton using double feature selection and cluster analyses identified species-specific and stage-specific gene transcripts that argue in favor of discrete genetic mechanisms that govern developmental programming of cotton fiber morphogenesis in these two cultivated species. Double feature selection analysis identified the highest number of differentially expressed genes that distinguish the fiber transcriptomes of developing Pima and TM1 fibers. These results were based on the finding that differences in fibers harvested between 17 and 24 day post-anthesis (dpa) represent the greatest expressional distance between the two species. This powerful selection method identified a subset of genes expressed during primary (PCW) and secondary (SCW) cell wall biogenesis in Pima fibers that exhibits an expression pattern that is generally reversed in TM1 at the same developmental stage. Cluster and functional analyses revealed that this subset of genes are primarily regulated during the transition stage that overlaps the termination of PCW and onset of SCW biogenesis, suggesting that these particular genes play a major role in the genetic mechanism that underlies the phenotypic differences in fiber traits between Pima and TM1.ConclusionThe novel application of double feature selection analysis led to the discovery of species- and stage-specific genetic expression patterns, which are biologically relevant to the genetic programs that underlie the differences in the fiber phenotypes in Pima and TM1. These results promise to have profound impacts on the ongoing efforts to improve cotton fiber traits.


Expert Systems With Applications | 2010

Support vector-based feature selection using Fisher's linear discriminant and Support Vector Machine

Eunseog Youn; Lars Koenig; Myong K. Jeong; Seung H. Baek

The problem of feature selection is to find a subset of features for optimal classification. A critical part of feature selection is to rank features according to their importance for classification. The Support Vector Machine (SVM) has been applied to a number of applications, such as bioinformatics, face recognition, text categorization, handwritten digit recognition, and so forth. Based on the success of the SVM, several feature selection algorithms that use it have recently been proposed. This paper proposes a new feature-ranking algorithm based on support vectors (SVs). Support vectors refer to those sample vectors that lie around the decision boundary between two different classes. Although SV-based feature ranking can be applied to any discriminant analysis, two linear discriminants are considered here: Fishers linear discriminant and the Support Vector Machine. Features are ranked based on the weight associated with each feature or as determined by recursive feature elimination. The experiments show that our feature-ranking algorithms are competitive in accuracy with the existing methods and much faster.


PLOS ONE | 2012

Transgene Silencing and Transgene-Derived siRNA Production in Tobacco Plants Homozygous for an Introduced AtMYB90 Construct

Jeff Velten; Cahid Cakir; Eunseog Youn; Junping Chen; Christopher I. Cazzonelli

Transgenic tobacco (Nicotiana tabacum) lines were engineered to ectopically over-express AtMYB90 (PAP2), an R2–R3 Myb gene associated with regulation of anthocyanin production in Arabidopsis thaliana. Independently transformed transgenic lines, Myb27 and Myb237, accumulated large quantities of anthocyanin, generating a dark purple phenotype in nearly all tissues. After self-fertilization, some progeny of the Myb27 line displayed an unexpected pigmentation pattern, with most leaves displaying large sectors of dramatically reduced anthocyanin production. The green-sectored 27Hmo plants were all found to be homozygous for the transgene and, despite a doubled transgene dosage, to have reduced levels of AtMYB90 mRNA. The observed reduction in anthocyanin pigmentation and AtMYB90 mRNA was phenotypically identical to the patterns seen in leaves systemically silenced for the AtMYB90 transgene, and was associated with the presence of AtMYB90-derived siRNA homologous to both strands of a portion of the AtMYB90 transcribed region. Activation of transgene silencing in the Myb27 line was triggered when the 35S::AtMYB90 transgene dosage was doubled, in both Myb27 homozygotes, and in plants containing one copy of each of the independently segregating Myb27 and Myb237 transgene loci. Mapping of sequenced siRNA molecules to the Myb27 TDNA (including flanking tobacco sequences) indicated that the 3′ half of the AtMYB90 transcript is the primary target for siRNA associated silencing in both homozygous Myb27 plants and in systemically silenced tissues. The transgene within the Myb27 line was found to consist of a single, fully intact, copy of the AtMYB90 construct. Silencing appears to initiate in response to elevated levels of transgene mRNA (or an aberrant product thereof) present within a subset of leaf cells, followed by spread of the resulting small RNA to adjacent leaf tissues and subsequent amplification of siRNA production.


Methods of Molecular Biology | 2009

Connecting Protein Interaction Data, Mutations, and Disease Using Bioinformatics

Jake Y. Chen; Eunseog Youn; Sean D. Mooney

Understanding how mutations lead to changes in protein function and/or protein interaction is critical to understanding the molecular causes of clinical phenotypes. In this method, we present a path toward integration of protein interaction data and mutation data and then demonstrate the identification of a subset of proteins and interactions that are important to a particular disease. We then build a statistical model of disease mutations in this disease-associated subset of proteins, and visualize these results. Using Alzheimers disease (AD) as case implementation, we find that we are able to identify a subset of proteins involved in AD and discriminate disease-associated mutations from SNPs in these proteins with 83% accuracy. As the molecular causes of disease become more understood, models such as these will be useful for identifying candidate variants most likely to be causative.


Journal of the Operational Research Society | 2009

A two-stage classification procedure for near-infrared spectra based on multi-scale vertical energy wavelet thresholding and SVM-based gradient-recursive feature elimination

Hyun-Woo Cho; Seung H. Baek; Eunseog Youn; Myong K. Jeong; Adam Taylor

Near infrared (NIR) spectroscopy has been extensively used in classification problems because it is fast, reliable, cost-effective, and non-destructive. However, NIR data often have several hundred or thousand variables (wavelengths) that are highly correlated with each other. Thus, it is critical to select a few important features or wavelengths that better explain NIR data. Wavelets are popular as preprocessing tools for spectra data. Many applications perform feature selection directly, based on high-dimensional wavelet coefficients, and this can be computationally expensive. This paper proposes a two-stage scheme for the classification of NIR spectra data. In the first stage, the proposed multi-scale vertical energy thresholding procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed support vector machines gradient-recursive feature elimination. The proposed two-stage method has produced better classification performance, with higher computational efficiency, when tested on four NIR data sets.


Remote Sensing Letters | 2014

An automated soil line identification method using relevance vector machine

Song Cui; Nithya Rajan; Stephan J. Maas; Eunseog Youn

The soil line is an important concept that describes the linear relationship between reflectance of bare soils in the near-infrared (NIR) and red (R) spectral bands. Bare soil line parameters (slope and intercept) are used in calculating several vegetation indices. Previous studies have proposed both manual and empirical procedures in estimating the bare soil parameters. Manual procedures introduce some amount of subjectivity in identifying the soil line. Empirical methods often suffer because of variations caused by soil type, moisture, and organic matter contents. The existence of non-bare soil pixels also affects these procedures. In this study, we proposed an automated supervised learning algorithm using relevance vector machine (RVM) for extracting the soil line from Landsat images. The 10-fold cross validation (10-fold CV) indicated 92% accuracy for distinguishing bare soil and other non-bare soil pixels from an image. The area under the receiver operating characteristic (ROC) curve reached a value of 0.98, indicating a significant predicting power of the proposed procedure. Additionally, this procedure was evaluated using data from 10 bare soil fields in the Texas High Plains region in 2008 and 2009. Statistical analysis indicated no significant difference between the observed and estimated bare soil line parameters. The proposed RVM-based procedure successfully incorporated machine-learning algorithms into agricultural remote sensing and eliminated the dependency on empiricism and minimized subjectivity.

Collaboration


Dive into the Eunseog Youn's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scot E. Dowd

Agricultural Research Service

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sean D. Mooney

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Predrag Radivojac

Indiana University Bloomington

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Song Cui

Middle Tennessee State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge