Aedín C. Culhane
Harvard University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aedín C. Culhane.
Science | 2007
Bijan Sobhian; Genze Shao; Dana R. Lilli; Aedín C. Culhane; Lisa A. Moreau; Bing Xia; David M. Livingston; Roger A. Greenberg
Mutations affecting the BRCT domains of the breast cancer–associated tumor suppressor BRCA1 disrupt the recruitment of this protein to DNA double-strand breaks (DSBs). The molecular structures at DSBs recognized by BRCA1 are presently unknown. We report the interaction of the BRCA1 BRCT domain with RAP80, a ubiquitin-binding protein. RAP80 targets a complex containing the BRCA1-BARD1 (BRCA1-associated ring domain protein 1) E3 ligase and the deubiquitinating enzyme (DUB) BRCC36 to MDC1-γH2AX–dependent lysine6- and lysine63-linked ubiquitin polymers at DSBs. These events are required for cell cycle checkpoint and repair responses to ionizing radiation, implicating ubiquitin chain recognition and turnover in the BRCA1-mediated repair of DSBs.
Nature Genetics | 2009
John P. A. Ioannidis; David B. Allison; Catherine A. Ball; Issa Coulibaly; Xiangqin Cui; Aedín C. Culhane; Mario Falchi; Cesare Furlanello; Giuseppe Jurman; Jon Mangion; Tapan Mehta; Michael Nitzberg; Grier P. Page; Enrico Petretto; Vera van Noort
Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.
BMC Bioinformatics | 2006
Ian B. Jeffery; Aedín C. Culhane
BackgroundNumerous feature selection methods have been applied to the identification of differentially expressed genes in microarray data. These include simple fold change, classical t-statistic and moderated t-statistics. Even though these methods return gene lists that are often dissimilar, few direct comparisons of these exist. We present an empirical study in which we compare some of the most commonly used feature selection methods. We apply these to 9 publicly available datasets, and compare, both the gene lists produced and how these perform in class prediction of test datasets.ResultsIn this study, we compared the efficiency of the feature selection methods; significance analysis of microarrays (SAM), analysis of variance (ANOVA), empirical bayes t-statistic, template matching, maxT, between group analysis (BGA), Area under the receiver operating characteristic (ROC) curve, the Welch t-statistic, fold change, rank products, and sets of randomly selected genes. In each case these methods were applied to 9 different binary (two class) microarray datasets. Firstly we found little agreement in gene lists produced by the different methods. Only 8 to 21% of genes were in common across all 10 feature selection methods. Secondly, we evaluated the class prediction efficiency of each gene list in training and test cross-validation using four supervised classifiers.ConclusionWe report that the choice of feature selection method, the number of genes in the genelist, the number of cases (samples) and the noise in the dataset, substantially influence classification success. Recommendations are made for choice of feature selection. Area under a ROC curve performed well with datasets that had low levels of noise and large sample size. Rank products performs well when datasets had low numbers of samples or high levels of noise. The Empirical bayes t-statistic performed well across a range of sample sizes.
Bioinformatics | 2005
Aedín C. Culhane; Jean Thioulouse; Guy Perrière
SUMMARY MADE4, microarray ade4, is a software package that facilitates multivariate analysis of microarray gene-expression data. MADE4 accepts a wide variety of gene-expression data formats. MADE4 takes advantage of the extensive multivariate statistical and graphical functions in the R package ade4, extending these for application to microarray data. In addition, MADE4 provides new graphical and visualization tools that aid in interpretation of multivariate analysis of microarray data.
Bioinformatics | 2002
Aedín C. Culhane; Guy Perrière; Elizabeth C. Considine; Thomas G. Cotter
MOTIVATION Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data.
Journal of the National Cancer Institute | 2012
Benjamin Haibe-Kains; Christine Desmedt; Sherene Loi; Aedín C. Culhane; Gianluca Bontempi; John Quackenbush; Christos Sotiriou
BACKGROUND Single sample predictors (SSPs) and Subtype classification models (SCMs) are gene expression-based classifiers used to identify the four primary molecular subtypes of breast cancer (basal-like, HER2-enriched, luminal A, and luminal B). SSPs use hierarchical clustering, followed by nearest centroid classification, based on large sets of tumor-intrinsic genes. SCMs use a mixture of Gaussian distributions based on sets of genes with expression specifically correlated with three key breast cancer genes (estrogen receptor [ER], HER2, and aurora kinase A [AURKA]). The aim of this study was to compare the robustness, classification concordance, and prognostic value of these classifiers with those of a simplified three-gene SCM in a large compendium of microarray datasets. METHODS Thirty-six publicly available breast cancer datasets (n = 5715) were subjected to molecular subtyping using five published classifiers (three SSPs and two SCMs) and SCMGENE, the new three-gene (ER, HER2, and AURKA) SCM. We used the prediction strength statistic to estimate robustness of the classification models, defined as the capacity of a classifier to assign the same tumors to the same subtypes independently of the dataset used to fit it. We used Cohen κ and Cramer V coefficients to assess concordance between the subtype classifiers and association with clinical variables, respectively. We used Kaplan-Meier survival curves and cross-validated partial likelihood to compare prognostic value of the resulting classifications. All statistical tests were two-sided. RESULTS SCMs were statistically significantly more robust than SSPs, with SCMGENE being the most robust because of its simplicity. SCMGENE was statistically significantly concordant with published SCMs (κ = 0.65-0.70) and SSPs (κ = 0.34-0.59), statistically significantly associated with ER (V = 0.64), HER2 (V = 0.52) status, and histological grade (V = 0.55), and yielded similar strong prognostic value. CONCLUSION Our results suggest that adequate classification of the major and clinically relevant molecular subtypes of breast cancer can be robustly achieved with quantitative measurements of three key genes.
Nucleic Acids Research | 2012
Misha Kapushesky; Tomasz Adamusiak; Tony Burdett; Aedín C. Culhane; Anna Farne; Alexey Filippov; Ele Holloway; Andrey Klebanov; Nataliya Kryvych; Natalja Kurbatova; Pavel Kurnosov; James P. Malone; Olga Melnichuk; Robert Petryszak; Nikolay Pultsin; Gabriella Rustici; Andrew Tikhonov; Ravensara S. Travillian; Eleanor Williams; Andrey Zorin; Helen E. Parkinson; Alvis Brazma
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies.
Nucleic Acids Research | 2004
Misha Kapushesky; Patrick Kemmeren; Aedín C. Culhane; Steffen Durinck; Jan Ihmels; Christine Körner; Meelis Kull; Aurora Torrente; Ugis Sarkans; Jaak Vilo; Alvis Brazma
Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Data analysis components for gene expression data preprocessing, missing value imputation, filtering, clustering methods, visualization, significant gene finding, between group analysis and other statistical components are available from the EBI (European Bioinformatics Institute) web site. The web-based design of Expression Profiler supports data sharing and collaborative analysis in a secure environment. Developed tools are integrated with the microarray gene expression database ArrayExpress and form the exploratory analytical front-end to those data. EP:NG is an open-source project, encouraging broad distribution and further extensions from the scientific community.
Cancer Research | 2008
Alfred S.L. Cheng; Aedín C. Culhane; Michael W.Y. Chan; Chinnambally Venkataramu; Mathias Ehrich; Aejaz Nasir; Benjamin Rodriguez; Pearlly S. Yan; John Quackenbush; Kenneth P. Nephew; Timothy J. Yeatman; Tim H M Huang
Estrogen imprinting is used to describe a phenomenon in which early developmental exposure to endocrine disruptors increases breast cancer risk later in adult life. We propose that long-lived, self-regenerating stem and progenitor cells are more susceptible to the exposure injury than terminally differentiated epithelial cells in the breast duct. Mammospheres, containing enriched breast progenitors, were used as an exposure system to simulate this imprinting phenomenon in vitro. Using MeDIP-chip, a methylation microarray screening method, we found that 0.5% (120 loci) of human CpG islands were hypermethylated in epithelial cells derived from estrogen-exposed progenitors compared with the non-estrogen-exposed control cells. This epigenetic event may lead to progressive silencing of tumor suppressor genes, including RUNX3, in these epithelial cells, which also occurred in primary breast tumors. Furthermore, normal tissue in close proximity to the tumor site also displayed RUNX3 hypermethylation, suggesting that this aberrant event occurs in early breast carcinogenesis. The high prevalence of estrogen-induced epigenetic changes in primary tumors and the surrounding histologically normal tissues provides the first empirical link between estrogen injury of breast stem/progenitor cells and carcinogenesis. This finding also offers a mechanistic explanation as to why a tumor suppressor gene, such as RUNX3, can be heritably silenced by epigenetic mechanisms in breast cancer.
BMC Bioinformatics | 2003
Aedín C. Culhane; Guy Perrière
BackgroundRapid development of DNA microarray technology has resulted in different laboratories adopting numerous different protocols and technological platforms, which has severely impacted on the comparability of array data. Current cross-platform comparison of microarray gene expression data are usually based on cross-referencing the annotation of each gene transcript represented on the arrays, extracting a list of genes common to all arrays and comparing expression data of this gene subset. Unfortunately, filtering of genes to a subset represented across all arrays often excludes many thousands of genes, because different subsets of genes from the genome are represented on different arrays. We wish to describe the application of a powerful yet simple method for cross-platform comparison of gene expression data. Co-inertia analysis (CIA) is a multivariate method that identifies trends or co-relationships in multiple datasets which contain the same samples. CIA simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance. CIA can be applied to datasets where the number of variables (genes) far exceeds the number of samples (arrays) such is the case with microarray analyses.ResultsWe illustrate the power of CIA for cross-platform analysis of gene expression data by using it to identify the main common relationships in expression profiles on a panel of 60 tumour cell lines from the National Cancer Institute (NCI) which have been subjected to microarray studies using both Affymetrix and spotted cDNA array technology. The co-ordinates of the CIA projections of the cell lines from each dataset are graphed in a bi-plot and are connected by a line, the length of which indicates the divergence between the two datasets. Thus, CIA provides graphical representation of consensus and divergence between the gene expression profiles from different microarray platforms. Secondly, the genes that define the main trends in the analysis can be easily identified.ConclusionsCIA is a robust, efficient approach to coupling of gene expression datasets. CIA provides simple graphical representations of the results making it a particularly attractive method for the identification of relationships between large datasets.