Yildiz E. Yilmaz
Memorial University of Newfoundland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yildiz E. Yilmaz.
Biometrical Journal | 2011
Jerald F. Lawless; Yildiz E. Yilmaz
Sequentially observed survival times are of interest in many studies but there are difficulties in analyzing such data using nonparametric or semiparametric methods. First, when the duration of followup is limited and the times for a given individual are not independent, induced dependent censoring arises for the second and subsequent survival times. Non-identifiability of the marginal survival distributions for second and later times is another issue, since they are observable only if preceding survival times for an individual are uncensored. In addition, in some studies a significant proportion of individuals may never have the first event. Fully parametric models can deal with these features, but robustness is a concern. We introduce a new approach to address these issues. We model the joint distribution of the successive survival times by using copula functions, and provide semiparametric estimation procedures in which copula parameters are estimated without parametric assumptions on the marginal distributions. This provides more robust estimates and checks on the fit of parametric models. The methodology is applied to a motivating example involving relapse and survival following colon cancer treatment.
BMC proceedings | 2014
Stefan Konigorski; Yildiz E. Yilmaz; Shelley B. Bull
We conduct genetic association analysis in the subset of unrelated individuals from the San Antonio Family Studies pedigrees, applying a two-stage approach to take account of the dependence between systolic and diastolic blood pressure (SBP and DBP). In the first stage, we adjust blood pressure for the effects of age, sex, smoking, and use of antihypertensive medication based on a novel modification of censored regression. In the second stage, we model the bivariate distribution of the adjusted SBP and DBP phenotypes by a copula function with interpretable SBP-DBP correlation parameters. This allows us to identify genetic variants associated with each of the adjusted blood pressures, as well as variants that explain the association between the two phenotypes. Within this framework, we define a pleiotropic variant as one that reduces the SBP-DBP correlation. Our results for whole genome sequence variants in the gene ULK4 on chromosome 3 suggest that inference obtained from a copula model can be more informative than findings from the SBP-specific and DBP-specific univariate models alone.
Journal of Clinical Oncology | 2013
Yildiz E. Yilmaz; Jerald F. Lawless; Irene L. Andrulis; Shelley B. Bull
With the ultimate aim of improving clinical management of breast cancer, investigators have sought to identify molecular genetic markers that stratify newly diagnosed patients into subtypes differing in short- or long-term prognosis. Conventional survival models can fail to describe adequately the relationship between subtype and disease recurrence, particularly when there is a substantial proportion of long-term disease-free survivors. The observed patterns of disease-free survival in an undifferentiated patient cohort may be explained by an underlying mixture of two subgroups: patients who will remain free of disease in the long term (ie, cured), and those who will experience disease recurrence within their lifetime (ie, susceptible.) In this article, we review the concepts and methods of the mixture cure models and apply them in the analysis of molecular genetic prognostic factors for disease-free survival and time to disease recurrence in a cohort of patients with axillary lymph node-negative breast cancer.
Lifetime Data Analysis | 2011
Yildiz E. Yilmaz; Jerald F. Lawless
Copula models for multivariate lifetimes have become widely used in areas such as biomedicine, finance and insurance. This paper fills some gaps in existing methodology for copula parameters and model assessment. We consider procedures based on likelihood and pseudolikelihood ratio statistics and introduce semiparametric maximum likelihood estimation leading to semiparametric versions. For cases where standard asymptotic approximations do not hold, we propose an efficient simulation technique for obtaining p-values. We apply these methods to tests for a copula model, based on embedding it in a larger copula family. It is shown that the likelihood and pseudolikelihood ratio tests are consistent even when the expanded copula model is misspecified. Power comparisons with two other tests of fit indicate that model expansion provides a convenient, powerful and robust approach. The methods are illustrated on an application concerning the time to loss of vision in the two eyes of an individual.
BMC Proceedings | 2011
Yildiz E. Yilmaz; Shelley B. Bull
Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
Computational Statistics & Data Analysis | 2011
Jerald F. Lawless; Yildiz E. Yilmaz
We consider bivariate distributions that are specified in terms of a parametric copula function and nonparametric or semiparametric marginal distributions. The performance of two semiparametric estimation procedures based on censored data is discussed: maximum likelihood (ML) and two-stage pseudolikelihood (PML) estimation. The two-stage procedure involves less computation and it is of interest to see whether it is significantly less efficient than the full maximum likelihood approach. We also consider cases where the copula model is misspecified, in which case PML may be better. Extensive simulation studies demonstrate that in the absence of covariates, two-stage estimation is highly efficient and has significant robustness advantages for estimating marginal distributions. In some settings, involving covariates and a high degree of association between responses, ML is more efficient. For the estimation of association, PML does not offer an advantage.
BMC Proceedings | 2016
Stefan Konigorski; Yildiz E. Yilmaz; Tobias Pischon
Recent work on genetic association studies suggests that much of the heritable variation in complex traits is unexplained, which indicates a need for using more biologically meaningful modeling approaches and appropriate statistical methods. In this study, we propose a biological framework and a corresponding statistical model incorporating multilevel biological measures, and illustrate it in the analysis of the real data provided by the Genetic Analysis Workshop (GAW) 19, which contains whole genome sequence (WGS), gene expression (GE), and blood pressure (BP) data. We investigate the direct effect of single-nucleotide variants (SNVs) on BP and GE, while considering the non-directional dependence between BP and GE, by using copula functions to jointly model BP and GE conditional on SNVs. We implement the method for analysis on a genome-wide scale, and illustrate it within an association analysis of 68,727 SNVs on chromosome 19 that lie in or around genes with available GE measures. Although there is no indication for inflated type I errors under the proposed method, our results show that the association tests have smaller p values than tests under univariate models for common and rare variants using single-variant tests and gene-based multimarker tests. Hence, considering multilevel biological measures and modeling the dependence structure between these measures by using a plausible graphical approach may lead to more informative findings than standard univariate tests of common variants and well-recognized gene-based rare variant tests.
Genetic Epidemiology | 2011
Joan E. Bailey-Wilson; Jennifer S. Brennan; Shelley B. Bull; Robert Culverhouse; Yoonhee Kim; Yuan Jiang; Jeesun Jung; Qing Li; Claudia Lamina; Ying Liu; Reedik Mägi; Yue S. Niu; Claire L. Simpson; Libo Wang; Yildiz E. Yilmaz; Heping Zhang; Zhaogong Zhang
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus‐specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population‐specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow‐up in the presence of extreme locus heterogeneity and large numbers of potential predictors. Genet. Epidemiol. 35:S92–S100, 2011.
PLOS ONE | 2018
Yanjing He; Michelle E. Penney; Amit A. Negandhi; Patrick S. Parfrey; Sevtap Savas; Yildiz E. Yilmaz
Background Metastasis is a major cause of mortality in cancer. Identifying prognostic factors that distinguish patients who will experience metastasis in the short-term and those that will be free of metastasis in the long-term is of particular interest in current medical research. The objective of this study was to examine if select genetic polymorphisms can differentiate colorectal cancer patients based on timing and long-term risk of metastasis. Methods The patient cohort consisted of 402 stage I-III colorectal cancer patients with microsatellite instability (MSI)-low (MSI-L) or microsatellite stable (MSS) tumors. We applied multivariable mixture cure model, which is the proper model when there is a substantial group of patients who remain free of metastasis in the long-term, to 26 polymorphisms. Time-dependent receiver operator characteristic (ROC) curve analysis was performed to determine the change in discriminatory accuracy of the models when the significant SNPs were included. Results After adjusting for significant baseline characteristics, two polymorphisms were significantly associated with time-to-metastasis: TT and TC genotypes of the XRCC3 Thr241Met (p = 0.042) and the 3R/3R genotype of TYMS 5’-UTR variable number tandem repeat (VNTR) (p = 0.009) were associated with decreased time-to-metastasis. ROC curves showed that the discriminatory accuracy of the model is increased slightly when these polymorphisms were added to the significant baseline characteristics. Conclusions Our results indicate XRCC3 Thr241Met and TYMS 5’-UTR VNTR polymorphisms are associated with time-to-metastasis, and may have potential biological roles in expediting the metastatic process. Once replicated, these associations could contribute to the development of precision medicine for colorectal cancer patients.
Genetic Epidemiology | 2018
Stefan Konigorski; Yuan Wang; Candemir Cigsar; Yildiz E. Yilmaz
In genetic association studies, it is important to distinguish direct and indirect genetic effects in order to build truly functional models. For this purpose, we consider a directed acyclic graph setting with genetic variants, primary and intermediate phenotypes, and confounding factors. In order to make valid statistical inference on direct genetic effects on the primary phenotype, it is necessary to consider all potential effects in the graph, and we propose to use the estimating equations method with robust Huber–White sandwich standard errors. We evaluate the proposed causal inference based on estimating equations (CIEE) method and compare it with traditional multiple regression methods, the structural equation modeling method, and sequential G‐estimation methods through a simulation study for the analysis of (completely observed) quantitative traits and time‐to‐event traits subject to censoring as primary phenotypes. The results show that CIEE provides valid estimators and inference by successfully removing the effect of intermediate phenotypes from the primary phenotype and is robust against measured and unmeasured confounding of the indirect effect through observed factors. All other methods except the sequential G‐estimation method for quantitative traits fail in some scenarios where their test statistics yield inflated type I errors. In the analysis of the Genetic Analysis Workshop 19 dataset, we estimate and test genetic effects on blood pressure accounting for intermediate gene expression phenotypes. The results show that CIEE can identify genetic variants that would be missed by traditional regression analyses. CIEE is computationally fast, widely applicable to different fields, and available as an R package.