Margus Lukk
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Margus Lukk.
Nucleic Acids Research | 2007
Helen Parkinson; Misha Kapushesky; Mohammadreza Shojatalab; Niran Abeygunawardena; Richard M. R. Coulson; Anna Farne; Ele Holloway; Nikolay Kolesnykov; P. Lilja; Margus Lukk; Roby Mani; Tim F. Rayner; Anjan Sharma; E. William; Ugis Sarkans; Alvis Brazma
ArrayExpress is a public database for high throughput functional genomics data. ArrayExpress consists of two parts—the ArrayExpress Repository, which is a MIAME supportive public archive of microarray data, and the ArrayExpress Data Warehouse, which is a database of gene expression profiles selected from the repository and consistently re-annotated. Archived experiments can be queried by experiment attributes, such as keywords, species, array platform, authors, journals or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized. ArrayExpress is a rapidly growing database, currently it contains data from >50 000 hybridizations and >1 500 000 individual expression profiles. ArrayExpress supports community standards, including MIAME, MAGE-ML and more recently the proposal for a spreadsheet based data exchange format: MAGE-TAB. Availability: .
Nucleic Acids Research | 2009
Helen E. Parkinson; Misha Kapushesky; Nikolay Kolesnikov; Gabriella Rustici; Mohammadreza Shojatalab; Niran Abeygunawardena; Hugo Bérubé; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Ele Holloway; Margus Lukk; James P. Malone; Roby Mani; Ekaterina Pilicheva; Tim F. Rayner; Faisal Ibne Rezwan; Anjan Sharma; Eleanor Williams; Xiangqun Zheng Bradley; Tomasz Adamusiak; Marco Brandizi; Tony Burdett; Richard M. R. Coulson; Maria Krestyaninova; Pavel Kurnosov; Eamonn Maguire; Sudeshna Guha Neogi; Philippe Rocca-Serra; Susanna-Assunta Sansone
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
Nucleic Acids Research | 2011
Helen E. Parkinson; Ugis Sarkans; Nikolay Kolesnikov; Niran Abeygunawardena; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Ele Holloway; Natalja Kurbatova; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Gabriella Rustici; Anjan Sharma; Eleanor Williams; Tomasz Adamusiak; Marco Brandizi; Nataliya Sklyar; Alvis Brazma
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Cell | 2015
Diego Villar; Camille Berthelot; Sarah Aldridge; Tim F. Rayner; Margus Lukk; Miguel Pignatelli; Thomas J. Park; Robert Deaville; Jonathan Thor Erichsen; Anna J. Jasinska; James M. A. Turner; Mads F. Bertelsen; Elizabeth P. Murchison; Paul Flicek; Duncan T. Odom
Summary The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution.
Nature Biotechnology | 2010
Margus Lukk; Misha Kapushesky; Janne Nikkilä; Helen Parkinson; Angela Goncalves; Wolfgang Huber; Esko Ukkonen; Alvis Brazma
To the Editor Although there is only one human genome sequence, different genes are expressed in many different cell types and tissues, as well as in different developmental stages or diseases. The structure of this ‘expression space’ is still largely unknown, as most transcriptomics experiments focus on sampling small regions. We have constructed a global gene expression map by integrating microarray data from 5,372 human samples representing 369 different cell and tissue types, disease states and cell lines. These have been compiled in an online resource (http://www.ebi.ac.uk/gxa/array/U133A) that allows the user to search for a gene of interest and find the conditions in which it is over- or underexpressed, or, conversely, to find which genes are over- or underexpressed in a particular condition. An analysis of the structure of the expression space reveals that it can be described by a small number of distinct expression profile classes and that the first three principal components of this space have biological interpretations. The hematopoietic system, solid tissues and incompletely differentiated cell types are arranged on the first principal axis; cell lines, neoplastic samples and nonneoplastic primary tissue–derived samples are on the second principal axis; and nervous system is separated from the rest of the samples on the third axis. We also show below that most cell lines cluster together rather than with their tissues of origin.
The EMBO Journal | 2014
Sandra Blanco; Sabine Dietmann; Joana V. Flores; Shobbir Hussain; Claudia Kutter; Peter Humphreys; Margus Lukk; Patrick Lombard; Lucas Treps; Martyna Popis; Stefanie Kellner; Sabine M. Hölter; Lillian Garrett; Wolfgang Wurst; Lore Becker; Thomas Klopstock; Helmut Fuchs; Valérie Gailus-Durner; Martin Hrabĕ de Angelis; Ragnhildur Káradóttir; Mark Helm; Jernej Ule; Joseph G. Gleeson; Duncan T. Odom; Michaela Frye
Mutations in the cytosine‐5 RNA methyltransferase NSun2 cause microcephaly and other neurological abnormalities in mice and human. How post‐transcriptional methylation contributes to the human disease is currently unknown. By comparing gene expression data with global cytosine‐5 RNA methylomes in patient fibroblasts and NSun2‐deficient mice, we find that loss of cytosine‐5 RNA methylation increases the angiogenin‐mediated endonucleolytic cleavage of transfer RNAs (tRNA) leading to an accumulation of 5′ tRNA‐derived small RNA fragments. Accumulation of 5′ tRNA fragments in the absence of NSun2 reduces protein translation rates and activates stress pathways leading to reduced cell size and increased apoptosis of cortical, hippocampal and striatal neurons. Mechanistically, we demonstrate that angiogenin binds with higher affinity to tRNAs lacking site‐specific NSun2‐mediated methylation and that the presence of 5′ tRNA fragments is sufficient and required to trigger cellular stress responses. Furthermore, the enhanced sensitivity of NSun2‐deficient brains to oxidative stress can be rescued through inhibition of angiogenin during embryogenesis. In conclusion, failure in NSun2‐mediated tRNA methylation contributes to human diseases via stress‐induced RNA cleavage.
pacific symposium on biocomputing | 2002
Mikko Koivisto; Markus Perola; Teppo Varilo; William Hennah; Jesper Ekelund; Margus Lukk; Leena Peltonen; Esko Ukkonen; Heikki Mannila
We describe a new method for finding haplotype blocks based on the use of the minimum description length principle. We give a rigorous definition of the quality of a segmentation of a genomic region into blocks, and describe a dynamic programming algorithm for finding the optimal segmentation with respect to this measure. We also describe a method for finding the probability of a block boundary for each pair of adjacent markers: this gives a tool for evaluating the significance of each block boundary. We have applied the method to the published data of Daly et al. The results are in relatively good agreement with the published results, but also show clear differences in the predicted block boundaries and their strengths. We also give results on the block structure in population isolates.
Bioinformatics | 2009
Audrey Kauffmann; Tim F. Rayner; Helen E. Parkinson; Misha Kapushesky; Margus Lukk; Alvis Brazma; Wolfgang Huber
Summary:ArrayExpress is one of the largest public repositories of microarray datasets. R/Bioconductor provides a comprehensive suite of microarray analysis and integrative bioinformatics software. However, easy ways for importing datasets from ArrayExpress into R/Bioconductor have been lacking. Here, we present such a tool that is suitable for both interactive and automated use. Availability: The ArrayExpress package is available from the Bioconductor project at http://www.bioconductor.org. A users guide and examples are provided with the package. Contact: [email protected] Supplementary information:Supplementary data are available Bioinformatics online.
BMC Bioinformatics | 2011
Matthew N. McCall; Peter Murakami; Margus Lukk; Wolfgang Huber; Rafael A. Irizarry
BackgroundMicroarray technology has become a widely used tool in the biological sciences. Over the past decade, the number of users has grown exponentially, and with the number of applications and secondary data analyses rapidly increasing, we expect this rate to continue. Various initiatives such as the External RNA Control Consortium (ERCC) and the MicroArray Quality Control (MAQC) project have explored ways to provide standards for the technology. For microarrays to become generally accepted as a reliable technology, statistical methods for assessing quality will be an indispensable component; however, there remains a lack of consensus in both defining and measuring microarray quality.ResultsWe begin by providing a precise definition of microarray quality and reviewing existing Affymetrix GeneChip quality metrics in light of this definition. We show that the best-performing metrics require multiple arrays to be assessed simultaneously. While such multi-array quality metrics are adequate for bench science, as microarrays begin to be used in clinical settings, single-array quality metrics will be indispensable. To this end, we define a single-array version of one of the best multi-array quality metrics and show that this metric performs as well as the best multi-array metrics. We then use this new quality metric to assess the quality of microarry data available via the Gene Expression Omnibus (GEO) using more than 22,000 Affymetrix HGU133a and HGU133plus2 arrays from 809 studies.ConclusionsWe find that approximately 10 percent of these publicly available arrays are of poor quality. Moreover, the quality of microarray measurements varies greatly from hybridization to hybridization, study to study, and lab to lab, with some experiments producing unusable data. Many of the concepts described here are applicable to other high-throughput technologies.
eLife | 2014
Benoit Ballester; Alejandra Medina-Rivera; Dominic Schmidt; Mar Gonzàlez-Porta; Matthew Carlucci; Xiaoting Chen; Kyle Chessman; Andre J. Faure; Alister P. W. Funnell; Angela Goncalves; Claudia Kutter; Margus Lukk; Suraj Menon; William M. McLaren; Klara Stefflova; Stephen Watt; Matthew T. Weirauch; Merlin Crossley; John C. Marioni; Duncan T. Odom; Paul Flicek; Michael D. Wilson
As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control. DOI: http://dx.doi.org/10.7554/eLife.02626.001