Olga A. Vsevolozhskaya
University of Kentucky
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olga A. Vsevolozhskaya.
Genetic Epidemiology | 2016
Olga A. Vsevolozhskaya; Dmitri V. Zaykin; David A. Barondess; Xiaoren Tong; Sneha Jadhav; Qing Lu
Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology‐driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample‐specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study.
Translational Psychiatry | 2017
Olga A. Vsevolozhskaya; Gabriel Ruiz; Dmitri V. Zaykin
Increased availability of data and accessibility of computational tools in recent years have created an unprecedented upsurge of scientific studies driven by statistical analysis. Limitations inherent to statistics impose constraints on the reliability of conclusions drawn from data, so misuse of statistical methods is a growing concern. Hypothesis and significance testing, and the accompanying P-values are being scrutinized as representing the most widely applied and abused practices. One line of critique is that P-values are inherently unfit to fulfill their ostensible role as measures of credibility for scientific hypotheses. It has also been suggested that while P-values may have their role as summary measures of effect, researchers underappreciate the degree of randomness in the P-value. High variability of P-values would suggest that having obtained a small P-value in one study, one is, ne vertheless, still likely to obtain a much larger P-value in a similarly powered replication study. Thus, “replicability of P-value” is in itself questionable. To characterize P-value variability, one can use prediction intervals whose endpoints reflect the likely spread of P-values that could have been obtained by a replication study. Unfortunately, the intervals currently in use, the frequentist P-intervals, are based on unrealistic implicit assumptions. Namely, P-intervals are constructed with the assumptions that imply substantial chances of encountering large values of effect size in an observational study, which leads to bias. The long-run frequentist probability provided by P-intervals is similar in interpretation to that of the classical confidence intervals, but the endpoints of any particular interval lack interpretation as probabilistic bounds for the possible spread of future P-values that may have been obtained in replication studies. Along with classical frequentist intervals, there exists a Bayesian viewpoint toward interval construction in which the endpoints of an interval have a meaningful probabilistic interpretation. We propose Bayesian intervals for prediction of P-value variability in prospective replication studies. Contingent upon approximate prior knowledge of the effect size distribution, our proposed Bayesian intervals have endpoints that are directly interpretable as probabilistic bounds for replication P-values, and they are resistant to selection bias. We showcase our approach by its application to P-values reported for five psychiatric disorders by the Psychiatric Genomics Consortium group.
Journal of Critical Care | 2018
Vedant Gupta; Matthew Sousa; Nathan Kraitman; Rahul Annabathula; Olga A. Vsevolozhskaya; Steve W. Leung; Vincent L. Sorrell
Purpose: Sepsis is a highly prevalent and fatal condition, with reported cardiovascular event rates as high as 25–30% at 1 year. Risk stratification in septic patients has been extremely limited. Material and methods: 267 septic patients with detectable troponin levels, APACHE II scores, and CT scans of the chest or abdomen were assessed. Patients with a recent cardiac intervention were excluded. Coronary artery calcification (CAC) was identified as present or absent on body CT scans. Cardiovascular death, acute myocardial infarction (AMI), or PCI at 1 year was assessed using multivariate logistic regression analysis. Results: Patients with CAC were older, predominantly male with more risk factors for coronary disease, but similar peak troponin levels and APACHE II scores. In a multivariate analysis, CAC was predictive of the primary outcome (OR 6.827; 95% CI 1.336–54.686; p = 0.037). Patients with no CAC, history of CHF or CKD were at low risk (<1%) for cardiovascular complications at 1 year even at very high troponin levels (<8.0 ng/dL). Conclusion: CAC risk stratifies septic patients for cardiovascular complications better than traditional risk factors and can be identified on body CT scans. This novel, risk stratifying framework built on CAC can help guide individualized management of septic patients. HighlightsCardiovascular complications after sepsis are common and maybe underappreciated.CAC identifies patients at risk for cardiovascular complication and all‐cause mortality after an admission for sepsis.Absence of CAC confers a low risk (≤ 1%) of acute myocardial infarction or need for revascularization at 1 year.
JACC: Basic to Translational Science | 2018
Travis Sexton; Guoying Zhang; Tracy E. Macaulay; Leigh Ann Callahan; Richard Charnigo; Olga A. Vsevolozhskaya; Zhenyu Li; Susan S. Smyth
Visual Abstract
Environmental Pollution | 2018
Michael C. Petriello; Jessie B. Hoffman; Olga A. Vsevolozhskaya; Andrew J. Morris; Bernhard Hennig
The gut microbiome is sensitive to diet and environmental exposures and is involved in the regulation of host metabolism. Additionally, gut inflammation is an independent risk factor for the development of metabolic diseases, specifically atherosclerosis and diabetes. Exposures to dioxin-like pollutants occur primarily via ingestion of contaminated foods and are linked to increased risk of developing cardiometabolic diseases. We aimed to elucidate the detrimental impacts of dioxin-like pollutant exposure on gut microbiota and host gut health and metabolism in a mouse model of cardiometabolic disease. We utilized 16S rRNA sequencing, metabolomics, and regression modeling to examine the impact of PCB 126 on the microbiome and host metabolism and gut health. 16S rRNA sequencing showed that gut microbiota populations shifted at the phylum and genus levels in ways that mimic observations seen in chronic inflammatory diseases. PCB 126 reduced cecum alpha diversity (0.60 fold change; pu202f=u202f0.001) and significantly increased the Firmicutes to Bacteroidetes ratio (1.63 fold change; pu202f=u202f0.044). Toxicant exposed mice exhibited quantifiable concentrations of PCB 126 in the colon, upregulation of Cyp1a1 gene expression, and increased markers of intestinal inflammation. Also, a significant correlation between circulating Glucagon-like peptide-1 (GLP-1) and Bifidobacterium was evident and dependent on toxicant exposure. PCB 126 exposure disrupted the gut microbiota and host metabolism and increased intestinal and systemic inflammation. These data imply that the deleterious effects of dioxin-like pollutants may be initiated in the gut, and the modulation of gut microbiota may be a sensitive marker of pollutant exposures.
Genetic Epidemiology | 2017
Olga A. Vsevolozhskaya; Chia-Ling Kuo; Gabriel Ruiz; Luda Diatchenko; Dmitri V. Zaykin
The increasing accessibility of data to researchers makes it possible to conduct massive amounts of statistical testing. Rather than follow specific scientific hypotheses with statistical analysis, researchers can now test many possible relationships and let statistics generate hypotheses for them. The field of genetic epidemiology is an illustrative case, where testing of candidate genetic variants for association with an outcome has been replaced by agnostic screening of the entire genome. Poor replication rates of candidate gene studies have improved dramatically with the increase in genomic coverage, due to factors such as adoption of better statistical practices and availability of larger sample sizes. Here, we suggest that another important factor behind the improved replicability of genome‐wide scans is an increase in the amount of statistical testing itself. We show that an increase in the number of tested hypotheses increases the proportion of true associations among the variants with the smallest P‐values. We develop statistical theory to quantify how the expected proportion of genuine signals (EPGS) among top hits depends on the number of tests. This enrichment of top hits by real findings holds regardless of whether genome‐wide statistical significance has been reached in a study. Moreover, if we consider only those “failed” studies that produce no statistically significant results, the same enrichment phenomenon takes place: the proportion of true associations among top hits grows with the number of tests. The enrichment occurs even if the true signals are encountered at the logarithmically decreasing rate with the additional testing.
Drug and Alcohol Dependence | 2017
Olga A. Vsevolozhskaya; Fernando A. Wagner; James C. Anthony
Aims: At CPDD 2015, we applied parametric Hill functions to estimate the probability of drug dependence in relation to the duration of drug-taking experience. A problem we and others have encountered in the estimation of risk of becoming a drugdependence case is an observed point estimate of zero – the so-called “zero-numerator problem.” This problem can be easily observed in certain low risk subgroups even when the sample is large (e.g., the incidence of heroin dependence among 12 yearold newly incident heroin users) or with small subgroup sample sizes. In these instances, tan observed zero point estimate does not necessarily imply zero risk of developing dependence for the subgroup. Here, our aim is to describe our approach to a potential solution to the zero-numerator problem based on a Bayesian model in conjunction with parametric Hill functions. Methods: The traditional frequentist statistical approach can provide an estimate for the 95% upper bound of an incident rate even with the observed zero in the numerator. A Bayesian approach is required if estimation of the incident rate itself is of interest. The Bayesian approach demands specification of a prior distribution for the risk parameter. In this work, we are exploring the sensitivity of the Hill function parameter estimates to the choice of a particular informative prior distribution across a range of estimated chances of developing drug dependence very soon after onset of drug use. Conclusions Whereas we frame our work in relation to risj of developing drug dependence syndromes, the zero-numerator problem often is faced in other contexts (e.g., pharmacokinetics, toxicology). Our approach, combining Bayesian statistics in conjunction with Hill functions, is expected to provied a useful solution to these zero numerator problems. The Zero-Numerator Problem # Smoking days past month n (unweighted) y (# Dependent) p̂ (weighted) 1 490 4 0.01 2 233 8 0.03 3 137 2 0.01 4 91 2 0.02 5 78 1 0.01 6 25 0 0.00 7 39 1 0.02 8 27 2 0.07 9 15 0 0.00 10 70 4 0.04 11 1 0 0.00 12 19 1 0.00 13 10 0 0.00 14 7 0 0.00 15 64 2 0.03 16 4 0 0.00 17 1 0 0.00 18 5 0 0.00 19 1 0 0.00 20 47 4 0.06 21 4 1 0.56 22 7 0 0.00 23 2 1 0.61 24 5 0 0.00 25 22 1 0.08 26 3 1 0.37 27 4 0 0.00 28 7 3 0.65 29 9 4 0.31 30 88 24 0.23 Table 1: Unweighted numbers of rapid incident onset (within 3 months of use) smokers with the corresponding weighted probability of nicotine dependence. Consider data from United States (US) National Surveys on Drug Use and Health (NSDUH) 2004 – 2013, over n = 1,515 (unweighted) subjects with smoking onset within 3months of assessment, who had smoked at least once during the past 30 days. Suppose we want to estimate the probability of nicotine dependence, p, given 6 days of smoking past month. Out of 25 subjects, none was qualified as a nicotine dependence case. Having observed no occurrences of the event does not imply that it has a zero probability of occurrence. This situation is referred to as the zero-numerator problem and it’s instances are highlighted in gray in Table 1. The zero-numerator problem can be approached with a Bayesian model. It is well known that a Betuebe1(uebe1, b) is a conjugate prior for the binomial distribution Buebe9n(n, p) and the corresponding posterior is Betuebe1(y + uebe1, n + b − y). Different Choices for Informative Priors Often researchers want the data ‘to dominate’ and thus assign a prior probability of an event that is ‘uninformative’ or vague in some sense. However, if one puts vague prior distributions on the parameter values, e.g, p ∼ Betuebe1(1,1) (uniform prior) then, in practice, all values of nicotine dependence probability are equally likely after X smoking days past month – an unlikely scenario in the zero-numerator setting. Additionally, with a correctly specified informative prior, Bayesian inference is not susceptible to selection bias, e.g., how many smoking days past month is associated with the highest risk of nicotine dependence? or to multiple comparisons. Next, we will look at the role of different informative priors on the results in zero-numerator problems. We propose Betuebe1(uebe1, b) priors with uebe1 and b chosen to reflect prior knowledge about p – the probability of dependence after X smoking days. To capture this knowledge, we consider a ‘rolling window’ across X − uebf7 and X + uebf7 days. The parameters uebe1 and b are obtained as follows: Assume a uniform Betuebe1(1,1) distribution of dependence probability over the X ± uebf7 smoking days window. The likelihood is formed as a product of the binomial densities over X − uebf7, . . . , X − 1, X + 1, . . . , X + uebf7 smoking days. Note, the information at X smoking days is excluded from the likelihood formation. The posterior probability of dependence over the X ± uebf7 smoking days follows Betuebe1(uebe1, b) distribution with: uebe1 = (# of dependent cases after X − uebf7, . . . , X − 1, X + 1, . . . , X + uebf7 smoking days) + 1 b = (# of subjects without dependence over the same window) + 1 Under the assumption of common p – the probability of dependence over the X ± uebf7 window, – the posterior Betuebe1(uebe1, b) becomes prior probability of dependence after X smoking days past month. Using the above algorithm, the posterior expectations of nicotine dependence for different uebf7’s are illustrated in Figure 1. Since the posterior expectation is a weighted average between the prior and the posterior means, the width of the ‘rolling window’ affects the results. If the window contributing to the prior knowledge of nicotine dependence is too wide (uebf7 = 30), the posterior expectation is dominated by the prior overall mean (flat line in the left plot of Figure 1). If the window is narrow, e.g., uebf7 = 1 or uebf7 = 2, the posterior probabilities are sensitive to day-to-day variability in the empirical chances of dependence. Regardless of the choice of uebf7, the zero numerator problem is completely eliminated. So which value of uebf7 should one use in practice? The answer can be obtained via leave-one-out cross-validation, which in our case finds uebf7 = 2 to be the optimal value. The right plot of Figure 1 illustrates the posterior expectations of dependence (with the corresponding 95% credible intervals) and the weighted empirical estimates from NSDUH. Note the overlap in the 95% credible intervals and 95% confidence intervals. 0.1 0.2 0.3 0 10 20 30 Days of Drug Use P os te rio r P ro ba bi lit y of D ep en de nc e
arXiv: Methodology | 2018
Olga A. Vsevolozhskaya; Dmitri V. Zaykin
arXiv: Methodology | 2018
Fengjiao Hu; Olga A. Vsevolozhskaya; Dmitri V. Zaykin
arXiv: Methodology | 2018
Olga A. Vsevolozhskaya; Gabriel Ruiz; Dmitri V. Zaykin