Joseph Usset
University of Kansas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph Usset.
BMC Bioinformatics | 2016
Devin C. Koestler; Meaghan J. Jones; Joseph Usset; Brock C. Christensen; Rondi A. Butler; Michael S. Kobor; John K. Wiencke; Karl T. Kelsey
BackgroundConfounding due to cellular heterogeneity represents one of the foremost challenges currently facing Epigenome-Wide Association Studies (EWAS). Statistical methods leveraging the tissue-specificity of DNA methylation for deconvoluting the cellular mixture of heterogenous biospecimens offer a promising solution, however the performance of such methods depends entirely on the library of methylation markers being used for deconvolution. Here, we introduce a novel algorithm for Identifying Optimal Libraries (IDOL) that dynamically scans a candidate set of cell-specific methylation markers to find libraries that optimize the accuracy of cell fraction estimates obtained from cell mixture deconvolution.ResultsApplication of IDOL to training set consisting of samples with both whole-blood DNA methylation data (Illumina HumanMethylation450 BeadArray (HM450)) and flow cytometry measurements of cell composition revealed an optimized library comprised of 300 CpG sites. When compared existing libraries, the library identified by IDOL demonstrated significantly better overall discrimination of the entire immune cell landscape (p = 0.038), and resulted in improved discrimination of 14 out of the 15 pairs of leukocyte subtypes. Estimates of cell composition across the samples in the training set using the IDOL library were highly correlated with their respective flow cytometry measurements, with all cell-specific R2>0.99 and root mean square errors (RMSEs) ranging from [0.97 % to 1.33 %] across leukocyte subtypes. Independent validation of the optimized IDOL library using two additional HM450 data sets showed similarly strong prediction performance, with all cell-specific R2>0.90 and RMSE<4.00 %. In simulation studies, adjustments for cell composition using the IDOL library resulted in uniformly lower false positive rates compared to competing libraries, while also demonstrating an improved capacity to explain epigenome-wide variation in DNA methylation within two large publicly available HM450 data sets.ConclusionsDespite consisting of half as many CpGs compared to existing libraries for whole blood mixture deconvolution, the optimized IDOL library identified herein resulted in outstanding prediction performance across all considered data sets and demonstrated potential to improve the operating characteristics of EWAS involving adjustments for cell distribution. In addition to providing the EWAS community with an optimized library for whole blood mixture deconvolution, our work establishes a systematic and generalizable framework for the assembly of libraries that improve the accuracy of cell mixture deconvolution.
Cancer Epidemiology, Biomarkers & Prevention | 2017
Devin C. Koestler; Joseph Usset; Brock C. Christensen; Carmen J. Marsit; Margaret R. Karagas; Karl T. Kelsey; John K. Wiencke
Background: The peripheral blood neutrophil-to-lymphocyte ratio (NLR) is a cytologic marker of both inflammation and poor outcomes in patients with cancer. DNA methylation is a key element of the epigenetic program defining different leukocyte subtypes and may provide an alternative to cytology in assessing leukocyte profiles. Our aim was to create a bioinformatic tool to estimate NLR using DNA methylation, and to assess its diagnostic and prognostic performance in human populations. Methods: We developed a DNA methylation–derived NLR (mdNLR) index based on normal isolated leukocyte methylation libraries and established cell-mixture deconvolution algorithms. The method was applied to cancer case–control studies of the bladder, head and neck, ovary, and breast, as well as publicly available data on cancer-free subjects. Results: Across cancer studies, mdNLR scores were either elevated in cases relative to controls, or associated with increased hazard of death. High mdNLR values (>5) were strong indicators of poor survival. In addition, mdNLR scores were elevated in males, in nonHispanic white versus Hispanic ethnicity, and increased with age. We also observed a significant interaction between cigarette smoking history and mdNLR on cancer survival. Conclusions: These results mean that our current understanding of mature leukocyte methylomes is sufficient to allow researchers and clinicians to apply epigenetically based analyses of NLR in clinical and epidemiologic studies of cancer risk and survival. Impact: As cytologic measurements of NLR are not always possible (i.e., archival blood), mdNLR, which is computed from DNA methylation signatures alone, has the potential to expand the scope of epigenome-wide association studies. Cancer Epidemiol Biomarkers Prev; 26(3); 328–38. ©2016 AACR.
Cancer Epidemiology, Biomarkers & Prevention | 2016
Joseph Usset; Rama Raghavan; Jonathan Tyrer; Valerie McGuire; Weiva Sieh; Penelope M. Webb; Jenny Chang-Claude; Anja Rudolph; Hoda Anton-Culver; Andrew Berchuck; Louise A. Brinton; Julie M. Cunningham; Anna deFazio; Jennifer A. Doherty; Robert P. Edwards; Simon A. Gayther; Aleksandra Gentry-Maharaj; Marc T. Goodman; Estrid Høgdall; Allan Jensen; Sharon E. Johnatty; Lambertus A. Kiemeney; Susanne K. Kjaer; Melissa C. Larson; Galina Lurie; Leon F.A.G. Massuger; Usha Menon; Francesmary Modugno; Kirsten B. Moysich; Roberta B. Ness
Background: Many epithelial ovarian cancer (EOC) risk factors relate to hormone exposure and elevated estrogen levels are associated with obesity in postmenopausal women. Therefore, we hypothesized that gene–environment interactions related to hormone-related risk factors could differ between obese and non-obese women. Methods: We considered interactions between 11,441 SNPs within 80 candidate genes related to hormone biosynthesis and metabolism and insulin-like growth factors with six hormone-related factors (oral contraceptive use, parity, endometriosis, tubal ligation, hormone replacement therapy, and estrogen use) and assessed whether these interactions differed between obese and non-obese women. Interactions were assessed using logistic regression models and data from 14 case–control studies (6,247 cases; 10,379 controls). Histotype-specific analyses were also completed. Results: SNPs in the following candidate genes showed notable interaction: IGF1R (rs41497346, estrogen plus progesterone hormone therapy, histology = all, P = 4.9 × 10−6) and ESR1 (rs12661437, endometriosis, histology = all, P = 1.5 × 10−5). The most notable obesity–gene–hormone risk factor interaction was within INSR (rs113759408, parity, histology = endometrioid, P = 8.8 × 10−6). Conclusions: We have demonstrated the feasibility of assessing multifactor interactions in large genetic epidemiology studies. Follow-up studies are necessary to assess the robustness of our findings for ESR1, CYP11A1, IGF1R, CYP11B1, INSR, and IGFBP2. Future work is needed to develop powerful statistical methods able to detect these complex interactions. Impact: Assessment of multifactor interaction is feasible, and, here, suggests that the relationship between genetic variants within candidate genes and hormone-related risk factors may vary EOC susceptibility. Cancer Epidemiol Biomarkers Prev; 25(5); 780–90. ©2016 AACR.
PLOS ONE | 2018
Janelle R. Noel-MacDonnell; Joseph Usset; Ellen L. Goode; Brooke L. Fridley
Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear “more” Gaussian in distribution.
F1000Research | 2016
Richard Meier; Stefan Graw; Joseph Usset; Rama Raghavan; Junqiang Dai; Prabhakar Chalise; Shellie D. Ellis; Brooke L. Fridley; Devin C. Koestler
From March through August 2015, nearly 60 teams from around the world participated in the Prostate Cancer Dream Challenge (PCDC). Participating teams were faced with the task of developing prediction models for patient survival and treatment discontinuation using baseline clinical variables collected on metastatic castrate-resistant prostate cancer (mCRPC) patients in the comparator arm of four phase III clinical trials. In total, over 2,000 mCRPC patients treated with first-line docetaxel comprised the training and testing data sets used in this challenge. In this paper we describe: (a) the sub-challenges comprising the PCDC, (b) the statistical metrics used to benchmark prediction performance, (c) our analytical approach, and finally (d) our team’s overall performance in this challenge. Specifically, we discuss our curated, ad-hoc, feature selection (CAFS) strategy for identifying clinically important risk-predictors, the ensemble-based Cox proportional hazards regression framework used in our final submission, and the adaptation of our modeling framework based on the results from the intermittent leaderboard rounds. Strong predictors of patient survival were successfully identified utilizing our model building approach. Several of the identified predictors were new features created by our team via strategically merging collections of weak predictors. In each of the three intermittent leaderboard rounds, our prediction models scored among the top four models across all participating teams and our final submission ranked 9 th place overall with an integrated area under the curve (iAUC) of 0.7711 computed in an independent test set. While the prediction performance of teams placing between 2 nd- 10 th (iAUC: 0.7710-0.7789) was better than the current gold-standard prediction model for prostate cancer survival, the top-performing team, FIMM-UTU significantly outperformed all other contestants with an iAUC of 0.7915. In summary, our ensemble-based Cox regression framework with CAFS resulted in strong overall performance for predicting prostate cancer survival and represents a promising approach for future prediction problems.
Cancer Research | 2015
Joseph Usset; Brooke L. Fridley; Ellen L. Goode; Joellen M. Schildkraut; Paul Pharoah
Background: We investigated the role of multifactor interactions and epithelial ovarian cancer (EOC) risk using data collected within the Collaborative Oncological Gene-environment Study and the Ovarian Cancer Association Consortium containing 18,000 EOC cases and 26,000 controls. To date, researchers have identified 18 EOC susceptibility loci. However, it has been estimated that many more common variant loci exist; we hypothesized that some of the unexplained variation is due to gene-environment interactions. Similar to breast and endometrial cancers, many EOC risk factors relate to estrogen exposure, and increased levels of estrogen has been associated with obesity in post-menopausal women. Therefore, we hypothesized that gene-environment interactions dealing with hormone-related risk factors could differ between obese and non-obese women. Methods: We considered multifactor interactions between single nucleotide polymorphisms (SNPs) and six hormone-related factors: oral contraceptive use; parity; endometriosis; tubal ligation; hormone replacement therapy (HRT); and estrogen use; and assessed whether these GE interactions differed between obese and non-obese women. The SNPs included in this analysis included the top associated markers within the 18 confirmed risk loci, as well as, 36,811 SNPs that lie within 84 candidate gene regions related to hormone biosynthesis & metabolism and insulin-like growth factors. For the candidate SNP analyses, we assessed multifactor interactions across environmental factors using logistic regression models including age, study site, and population substructure. For the candidate gene analyses, a two-stage screening and testing procedure was utilized. All analyses were completed including: (1) all histologies; and (2) only serous histology. Results: For the 18 known EOC risk loci, no hormone-related risk factors showed statistically significant interaction at a threshold of 10-4. In contrast, the two-stage analysis of the candidate genes found several significant gene-environment-environment effects, after adjusting for multiple testing. For analysis of all cases, the SNPs in the following candidate genes showed a multifactor interaction with a hormone-related risk factor and obesity: IGFBP5 (HRT, rs729597, p = 1.3×10-13), CGA (HRT, rs58124219, p = 1.0×10-5), and SSTR1 (parity, rs77266093, p = 1.1×10-5). Similarly, the serous histology analyses detected a multifactor interaction with HRT, obesity and IGFBP5 (rs729597, p = 8.8×10-5). Additionally, serous only analyses also detected an interaction involving ESR1 (tubal ligation, Chr6:152246911(indel), p = 8.0×10-6). Conclusions: In a very large case-control collection, assessment of multifactor gene-environment interaction is feasible, and, here, suggest that the relationship between SNPs in candidate genes and hormone-related risk factors in EOC susceptibility may vary. Citation Format: Joseph Usset, Brooke Fridley, Ellen Goode, Joellen Schildkraut, Paul Pharoah. Assessment of multifactor gene-environment interactions and ovarian cancer risk: SNPs, obesity, and hormone-related risk factors. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 4684. doi:10.1158/1538-7445.AM2015-4684
Computational Statistics & Data Analysis | 2016
Joseph Usset; Ana-Maria Staicu; Arnab Maity
Archive | 2016
Prabhakar Chalise; Junqiang Dai; Devin C. Koestler; Joseph Usset; Richard Meier; Shellie D. Ellis; Brooke L. Fridley; Stefan Graw; Rama Raghavan
Journal of Agricultural Biological and Environmental Statistics | 2015
Joseph Usset; Arnab Maity; Ana-Maria Staicu; Armin Schwartzman
Clinical & Experimental Metastasis | 2018
Eric D. Young; Kyle Strom; Ashley F. Tsue; Joseph Usset; Seth MacPherson; John T. McGuire; Danny R. Welch