Andrei Yakovlev
University of Rochester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrei Yakovlev.
Journal of the American Statistical Association | 2003
Alex Tsodikov; Joseph G. Ibrahim; Andrei Yakovlev
This article considers the utility of the bounded cumulative hazard model in cure rate estimation, which is an appealing alternative to the widely used two-component mixture model. This approach has the following distinct advantages: (1) It allows for a natural way to extend the proportional hazards regression model, leading to a wide class of extended hazard regression models. (2) In some settings the model can be interpreted in terms of biologically meaningful parameters. (3) The model structure is particularly suitable for semiparametric and Bayesian methods of statistical inference. Notwithstanding the fact that the model has been around for less than a decade, a large body of theoretical results and applications has been reported to date. This review article is intended to give a big picture of these modeling techniques and associated statistical problems. These issues are discussed in the context of survival data in cancer.
The Annals of Applied Statistics | 2007
Alexander Y. Gordon; Galina V. Glazko; Xing Qiu; Andrei Yakovlev
The Bonferroni multiple testing procedure is commonly perceived as being overly conservative in large-scale simultaneous testing situations such as those that arise in microarray data analysis. The objective of the present study is to show that this popular belief is due to overly stringent requirements that are typically imposed on the procedure rather than to its conservative nature. To get over its notorious conservatism, we advocate using the Bonferroni selection rule as a procedure that controls the per family error rate (PFER). The present paper reports the first study of stability properties of the Bonferroni and Benjamini--Hochberg procedures. The Bonferroni procedure shows a superior stability in terms of the variance of both the number of true discoveries and the total number of discoveries, a property that is especially important in the presence of correlations between individual
Nature | 2008
Helene McMurray; Erik R. Sampson; George Compitello; Conan Kinsey; Laurel Newman; Bradley Smith; Shaw-Ree Chen; Lev B. Klebanov; Peter Salzman; Andrei Yakovlev; Hartmut Land
p
BMC Bioinformatics | 2005
Xing Qiu; Andrew I. Brooks; Lev B. Klebanov; Andrei Yakovlev
-values. Its stability and the ability to provide strong control of the PFER make the Bonferroni procedure an attractive choice in microarray studies.
Biology Direct | 2007
Lev B. Klebanov; Andrei Yakovlev
Understanding the molecular underpinnings of cancer is of critical importance to the development of targeted intervention strategies. Identification of such targets, however, is notoriously difficult and unpredictable. Malignant cell transformation requires the cooperation of a few oncogenic mutations that cause substantial reorganization of many cell features and induce complex changes in gene expression patterns. Genes critical to this multifaceted cellular phenotype have therefore only been identified after signalling pathway analysis or on an ad hoc basis. Our observations that cell transformation by cooperating oncogenic lesions depends on synergistic modulation of downstream signalling circuitry suggest that malignant transformation is a highly cooperative process, involving synergy at multiple levels of regulation, including gene expression. Here we show that a large proportion of genes controlled synergistically by loss-of-function p53 and Ras activation are critical to the malignant state of murine and human colon cells. Notably, 14 out of 24 ‘cooperation response genes’ were found to contribute to tumour formation in gene perturbation experiments. In contrast, only 1 in 14 perturbations of the genes responding in a non-synergistic manner had a similar effect. Synergistic control of gene expression by oncogenic mutations thus emerges as an underlying key to malignancy, and provides an attractive rationale for identifying intervention targets in gene networks downstream of oncogenic gain- and loss-of-function mutations.
Statistical Applications in Genetics and Molecular Biology | 2005
Xing Qiu; Lev B. Klebanov; Andrei Yakovlev
BackgroundStochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test-statistics across genes. It is frequently assumed that dependence between genes (or tests) is suffciently weak to justify the proposed methods of testing for differentially expressed genes. A potential impact of between-gene correlations on the performance of such methods has yet to be explored.ResultsThe paper presents a systematic study of correlation between the t-statistics associated with different genes. We report the effects of four different normalization methods using a large set of microarray data on childhood leukemia in addition to several sets of simulated data. Our findings help decipher the correlation structure of microarray data before and after the application of normalization procedures.ConclusionA long-range correlation in microarray data manifests itself in thousands of genes that are heavily correlated with a given gene in terms of the associated t-statistics. By using normalization methods it is possible to significantly reduce correlation between the t-statistics computed for different genes. Normalization procedures affect both the true correlation, stemming from gene interactions, and the spurious correlation induced by random noise. When analyzing real world biological data sets, normalization procedures are unable to completely remove correlation between the test statistics. The long-range correlation structure also persists in normalized data.
BMC Bioinformatics | 2006
Xing Qiu; Yuanhui Xiao; Alexander Y. Gordon; Andrei Yakovlev
BackgroundMicroarray gene expression data are commonly perceived as being extremely noisy because of many imperfections inherent in the current technology. A recent study conducted by the MicroArray Quality Control (MAQC) Consortium and published in Nature Biotechnology provides a unique opportunity to probe into the true level of technical noise in such data.ResultsIn the present report, the MAQC study is reanalyzed in order to quantitatively assess measurement errors inherent in high-density oligonucleotide array technology (Affymetrix platform). The level of noise is directly estimated from technical replicates of gene expression measurements in the absence of biological variability. For each probe set, the magnitude of random fluctuations across technical replicates is characterized by the standard deviation of the corresponding log-expression signal. The resultant standard deviations appear to be uniformly small and symmetrically distributed across probe sets. The observed noise level does not cause any tangible bias in estimated pair-wise correlation coefficients, the latter being particularly prone to its presence in microarray data.ConclusionThe reported analysis strongly suggests that, contrary to popular belief, the random fluctuations of gene expression signals caused by technical noise are quite low and the effect of such fluctuations on the results of statistical inference from Affymetrix GeneChip microarray data is negligibly small.ReviewersThe paper was reviewed by A. Mushegian, K. Jordan, and E. Koonin.
BMC Bioinformatics | 2009
Rui Hu; Xing Qiu; Galina V. Glazko; Lev B. Klebanov; Andrei Yakovlev
Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. The empirical Bayes methodology in the nonparametric and parametric formulations, as well as closely related methods employing a two-component mixture model, represent typical examples. It is frequently assumed that dependence between gene expressions (or associated test statistics) is sufficiently weak to justify the application of such methods for selecting differentially expressed genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.
BMC Bioinformatics | 2004
Yuanhui Xiao; Robert D. Frisina; Alexander Y. Gordon; Lev B. Klebanov; Andrei Yakovlev
BackgroundThe number of genes declared differentially expressed is a random variable and its variability can be assessed by resampling techniques. Another important stability indicator is the frequency with which a given gene is selected across subsamples. We have conducted studies to assess stability and some other properties of several gene selection procedures with biological and simulated data.ResultsUsing resampling techniques we have found that some genes are selected much less frequently (across sub-samples) than other genes with the same adjusted p-values. The extent to which this type of instability manifests itself can be assessed by a method introduced in this paper. The effect of correlation between gene expression levels on the performance of multiple testing procedures is studied by computer simulations.ConclusionResampling represents a tool for reducing the set of initially selected genes to those with a sufficiently high selection frequency. Using resampling techniques it is also possible to assess variability of different performance indicators. Stability properties of several multiple testing procedures are described at length in the present paper.
Journal of Bioinformatics and Computational Biology | 2007
Lev B. Klebanov; Galina V. Glazko; Peter Salzman; Andrei Yakovlev; Yuanhui Xiao
BackgroundMicroarray technology is commonly used as a simple screening tool with a focus on selecting genes that exhibit extremely large differential expressions between different phenotypes. It lacks the ability to select genes that change their relationships with other genes in different biological conditions (differentially correlated genes). We intend to enrich the above procedure by proposing a nonparametric selection procedure that selects differentially correlated genes.ResultsUsing both simulations and resampling techniques, we found that our procedure correctly detected genes that were not differentially expressed but differentially correlated. We also applied our procedure to a set of biological data and found some potentially important genes that were not selected by the traditional method.Discussion and ConclusionMicroarray technology yields multidimensional information on the function of the whole genome. Rather than treating intergene correlation as a nuisance to the traditional gene selection procedures which are essentially univariate, our method utilizes the rich information contained in the correlation as a new selection criterion. It can provide additional useful candidate genes for the biologists.