Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anna Okula Basile is active.

Publication


Featured researches published by Anna Okula Basile.


Nature Communications | 2017

PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies

Molly A. Hall; John R. Wallace; Anastasia Lucas; Dokyoon Kim; Anna Okula Basile; Shefali S. Verma; Catherine A. McCarty; Murray H. Brilliant; Peggy L. Peissig; Terrie Kitchner; Anurag Verma; Sarah A. Pendergrass; Scott M. Dudek; Jason H. Moore; Marylyn D. Ritchie

Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene–environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits.Centralized infrastructure to support analyses involving complexity beyond genome-wide association studies is broadly needed. Here, Ritchie and colleagues develop PLATO, a software tool to process and integrate various methods for this task.


PLOS ONE | 2016

Phenome-Wide Association Study to Explore Relationships between Immune System Related Genetic Loci and Complex Traits and Diseases.

Anurag Verma; Anna Okula Basile; Yuki Bradford; Helena Kuivaniemi; Gerard Tromp; David J. Carey; Glenn S. Gerhard; James E. Crowe; Marylyn D. Ritchie; Sarah A. Pendergrass

We performed a Phenome-Wide Association Study (PheWAS) to identify interrelationships between the immune system genetic architecture and a wide array of phenotypes from two de-identified electronic health record (EHR) biorepositories. We selected variants within genes encoding critical factors in the immune system and variants with known associations with autoimmunity. To define case/control status for EHR diagnoses, we used International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from 3,024 Geisinger Clinic MyCode® subjects (470 diagnoses) and 2,899 Vanderbilt University Medical Center BioVU biorepository subjects (380 diagnoses). A pooled-analysis was also carried out for the replicating results of the two data sets. We identified new associations with potential biological relevance including SNPs in tumor necrosis factor (TNF) and ankyrin-related genes associated with acute and chronic sinusitis and acute respiratory tract infection. The two most significant associations identified were for the C6orf10 SNP rs6910071 and “rheumatoid arthritis” (ICD-9 code category 714) (pMETAL = 2.58 x 10−9) and the ATN1 SNP rs2239167 and “diabetes mellitus, type 2” (ICD-9 code category 250) (pMETAL = 6.39 x 10−9). This study highlights the utility of using PheWAS in conjunction with EHRs to discover new genotypic-phenotypic associations for immune-system related genetic loci.


Biodata Mining | 2016

A biologically informed method for detecting rare variant associations

Carrie B. Moore; Anna Okula Basile; John R. Wallace; Alex T. Frase; Marylyn D. Ritchie

BackgroundBioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting.Purpose of the studyIn this manuscript, we describe the software package for BioBin, along with type I error and power simulations to demonstrate the strengths and various customizable features and analysis options of this variant binning tool.ResultsSimulation testing highlights the utility of BioBin as a fast, comprehensive and expandable tool for the biologically-inspired binning and analysis of low-frequency variants in sequence data.Conclusions and potential implicationsThe BioBin software package has the capability to transform and streamline the analysis pipelines for researchers analyzing rare variants. This automated bioinformatics tool minimizes the manual effort of creating genomic regions for binning such that time can be spent on the much more interesting task of statistical analyses. This software package is open source and freely available from http://ritchielab.com/software/biobin-download


pacific symposium on biocomputing | 2016

KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN

Anna Okula Basile; John R. Wallace; Peggy L. Peissig; Catherine A. McCarty; Murray H. Brilliant; Marylyn D. Ritchie

Next-generation sequencing technology has presented an opportunity for rare variant discovery and association of these variants with disease. To address the challenges of rare variant analysis, multiple statistical methods have been developed for combining rare variants to increase statistical power for detecting associations. BioBin is an automated tool that expands on collapsing/binning methods by performing multi-level variant aggregation with a flexible, biologically informed binning strategy using an internal biorepository, the Library of Knowledge (LOKI). The databases within LOKI provide variant details, regional annotations and pathway interactions which can be used to generate bins of biologically-related variants, thereby increasing the power of any subsequent statistical test. In this study, we expand the framework of BioBin to incorporate statistical tests, including a dispersion-based test, SKAT, thereby providing the option of performing a unified collapsing and statistical rare variant analysis in one tool. Extensive simulation studies performed on gene-coding regions showed a Bin-KAT analysis to have greater power than BioBin-regression in all simulated conditions, including variants influencing the phenotype in the same direction, a scenario where burden tests often retain greater power. The use of Madsen- Browning variant weighting increased power in the burden analysis to that equitable with Bin-KAT; but overall Bin-KAT retained equivalent or higher power under all conditions. Bin-KAT was applied to a study of 82 pharmacogenes sequenced in the Marshfield Personalized Medicine Research Project (PMRP). We looked for association of these genes with 9 different phenotypes extracted from the electronic health record. This study demonstrates that Bin-KAT is a powerful tool for the identification of genes harboring low frequency variants for complex phenotypes.


BMC Medical Informatics and Decision Making | 2017

Knowledge-driven binning approach for rare variant association analysis: application to neuroimaging biomarkers in Alzheimer’s disease

Dokyoon Kim; Anna Okula Basile; Lisa Bang; Emrin Horgusluoglu; Seunggeun Lee; Marylyn D. Ritchie; Andrew J. Saykin; Kwangsik Nho

BackgroundRapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer’s disease (LOAD).MethodsFor rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons.ResultsOur knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ1–42 (p-value < 0.05).ConclusionsOur novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.


Proceedings of the Pacific Symposium | 2018

Session Introduction: Challenges of Pattern Recognition in Biomedical Data

Shefali S. Verma; Anurag Verma; Anna Okula Basile; Marta-Byrska Bishop; Christian Darabos

The analysis of large biomedical data often presents with various challenges related to not just the size of the data, but also to data quality issues such as heterogeneity, multidimensionality, noisiness, and incompleteness of the data. The data-intensive nature of computational genomics problems in biomedical informatics warrants the development and use of massive computer infrastructure and advanced software tools and platforms, including but not limited to the use of cloud computing. Our session aims to address these challenges in handling big data for designing a study, performing analysis, and interpreting outcomes of these analyses. These challenges have been prevalent in many studies including those which focus on the identification of novel genetic variant-phenotype associations using data from sources like Electronic Health Records (EHRs) or multi-omic data. One of the biggest challenges to focus on is the imperfect nature of the biomedical data where a lot of noise and sparseness is observed. In our session, we will present research articles that can help in identifying innovative ways to recognize and overcome newly arising challenges associated with pattern recognition in biomedical data.


Expert Review of Molecular Diagnostics | 2018

Informatics and machine learning to define the phenotype

Anna Okula Basile; Marylyn D. Ritchie

ABSTRACT Introduction: For the past decade, the focus of complex disease research has been the genotype. From technological advancements to the development of analysis methods, great progress has been made. However, advances in our definition of the phenotype have remained stagnant. Phenotype characterization has recently emerged as an exciting area of informatics and machine learning. The copious amounts of diverse biomedical data that have been collected may be leveraged with data-driven approaches to elucidate trait-related features and patterns. Areas covered: In this review, the authors discuss the phenotype in traditional genetic associations and the challenges this has imposed.Approaches for phenotype refinement that can aid in more accurate characterization of traits are also discussed. Further, the authors highlight promising machine learning approaches for establishing a phenotype and the challenges of electronic health record (EHR)-derived data. Expert commentary: The authors hypothesize that through unsupervised machine learning, data-driven approaches can be used to define phenotypes rather than relying on expert clinician knowledge. Through the use of machine learning and an unbiased set of features extracted from clinical repositories, researchers will have the potential to further understand complex traits and identify patient subgroups. This knowledge may lead to more preventative and precise clinical care.


Bioinformatics | 2018

Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants

Anna Okula Basile; Marta Byrska-Bishop; John R. Wallace; Alex T. Frase; Marylyn D. Ritchie

Motivation BioBin is an automated bioinformatics tool for the multi‐level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin‐download


arXiv: Quantitative Methods | 2017

An Unsupervised Homogenization Pipeline for Clustering Similar Patients using Electronic Health Record Data

Alvaro Ulloa; Anna Okula Basile; Gregory J. Wehner; Linyuan Jing; Marylyn D. Ritchie; Brett Beaulieu-Jones; Christopher M. Haggerty; Brandon K Fornwalt


Proceedings of the Pacific Symposium | 2017

PATTERNS IN BIOMEDICAL DATA-HOW DO WE FIND THEM?

Anna Okula Basile; Anurag Verma; Marta Byrska-Bishop; Sarah A. Pendergrass; Christian Darabos; H. Lester Kirchner

Collaboration


Dive into the Anna Okula Basile's collaboration.

Top Co-Authors

Avatar

Marylyn D. Ritchie

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Anurag Verma

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

John R. Wallace

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Dokyoon Kim

Geisinger Health System

View shared research outputs
Top Co-Authors

Avatar

Sarah A. Pendergrass

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Alex T. Frase

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge