Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Johan Rung is active.

Publication


Featured researches published by Johan Rung.


Nucleic Acids Research | 2012

ArrayExpress update—trends in database growth and links to data analysis tools

Gabriella Rustici; Nikolay Kolesnikov; Marco Brandizi; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Jon Ison; Maria Keays; Natalja Kurbatova; James Malone; Roby Mani; Annalisa Mupo; Rui Pedro Pereira; Ekaterina Pilicheva; Johan Rung; Anjan Sharma; Y. Amy Tang; Tobias Ternent; Andrew Tikhonov; Danielle Welter; Eleanor Williams; Alvis Brazma; Helen E. Parkinson; Ugis Sarkans

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.


Bioinformatics | 2012

Tools for mapping high-throughput sequencing data

Nuno A. Fonseca; Johan Rung; Alvis Brazma; John C. Marioni

MOTIVATION A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. RESULTS This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.


Nature Reviews Genetics | 2013

Reuse of public genome-wide gene expression data

Johan Rung; Alvis Brazma

Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments — microarrays and next-generation sequencing — have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.


Genome Biology | 2013

Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene

Mar Gonzàlez-Porta; Adam Frankish; Johan Rung; Jennifer Harrow; Alvis Brazma

BackgroundRNA sequencing has opened new avenues for the study of transcriptome composition. Significant evidence has accumulated showing that the human transcriptome contains in excess of a hundred thousand different transcripts. However, it is still not clear to what extent this diversity prevails when considering the relative abundances of different transcripts from the same gene.ResultsHere we show that, in a given condition, most protein coding genes have one major transcript expressed at significantly higher level than others, that in human tissues the major transcripts contribute almost 85 percent to the total mRNA from protein coding loci, and that often the same major transcript is expressed in many tissues. We detect a high degree of overlap between the set of major transcripts and a recently published set of alternatively spliced transcripts that are predicted to be translated utilizing proteomic data. Thus, we hypothesize that although some minor transcripts may play a functional role, the major ones are likely to be the main contributors to the proteome. However, we still detect a non-negligible fraction of protein coding genes for which the major transcript does not code a protein.ConclusionsOverall, our findings suggest that the transcriptome from protein coding loci is dominated by one transcript per gene and that not all the transcripts that contribute to transcriptome diversity are equally likely to contribute to protein diversity. This observation can help to prioritize candidate targets in proteomics research and to predict the functional impact of the detected changes in variation studies.


Nature Genetics | 2015

The impact of low-frequency and rare variants on lipid levels

Ida Surakka; Momoko Horikoshi; Reedik Mägi; Antti-Pekka Sarin; Anubha Mahajan; Vasiliki Lagou; Letizia Marullo; Teresa Ferreira; Benjamin Miraglio; Sanna Timonen; Johannes Kettunen; Matti Pirinen; Juha Karjalainen; Gudmar Thorleifsson; Sara Hägg; Jouke-Jan Hottenga; Aaron Isaacs; Claes Ladenvall; Marian Beekman; Tonu Esko; Janina S. Ried; Christopher P. Nelson; Christina Willenborg; Stefan Gustafsson; Harm-Jan Westra; Matthew Blades; Anton J. M. de Craen; Eco J. C. de Geus; Joris Deelen; Harald Grallert

Using a genome-wide screen of 9.6 million genetic variants achieved through 1000 Genomes Project imputation in 62,166 samples, we identify association to lipid traits in 93 loci, including 79 previously identified loci with new lead SNPs and 10 new loci, 15 loci with a low-frequency lead SNP and 10 loci with a missense lead SNP, and 2 loci with an accumulation of rare variants. In six loci, SNPs with established function in lipid genetics (CELSR2, GCKR, LIPC and APOE) or candidate missense mutations with predicted damaging function (CD300LG and TM6SF2) explained the locus associations. The low-frequency variants increased the proportion of variance explained, particularly for low-density lipoprotein cholesterol and total cholesterol. Altogether, our results highlight the impact of low-frequency variants in complex traits and show that imputation offers a cost-effective alternative to resequencing.


Genome Biology | 2010

Large scale comparison of global gene expression patterns in human and mouse

Xiangqun Zheng-Bradley; Johan Rung; Helen Parkinson; Alvis Brazma

BackgroundIt is widely accepted that orthologous genes between species are conserved at the sequence level and perform similar functions in different organisms. However, the level of conservation of gene expression patterns of the orthologous genes in different species has been unclear. To address the issue, we compared gene expression of orthologous genes based on 2,557 human and 1,267 mouse samples with high quality gene expression data, selected from experiments stored in the public microarray repository ArrayExpress.ResultsIn a principal component analysis (PCA) of combined data from human and mouse samples merged on orthologous probesets, samples largely form distinctive clusters based on their tissue sources when projected onto the top principal components. The most prominent groups are the nervous system, muscle/heart tissues, liver and cell lines. Despite the great differences in sample characteristics and experiment conditions, the overall patterns of these prominent clusters are strikingly similar for human and mouse. We further analyzed data for each tissue separately and found that the most variable genes in each tissue are highly enriched with human-mouse tissue-specific orthologs and the least variable genes in each tissue are enriched with human-mouse housekeeping orthologs.ConclusionsThe results indicate that the global patterns of tissue-specific expression of orthologous genes are conserved in human and mouse. The expression of groups of orthologous genes co-varies in the two species, both for the most variable genes and the most ubiquitously expressed genes.


Human Molecular Genetics | 2014

TCF7L2 is a master regulator of insulin production and processing

Yuedan Zhou; Soo Young Park; Jing Su; Kathleen A. Bailey; Emilia Ottosson-Laakso; Liliya Shcherbina; Nikolay Oskolkov; Enming Zhang; Thomas Thevenin; João Fadista; Hedvig Bennet; Petter Vikman; Nils Wierup; Malin Fex; Johan Rung; Claes B. Wollheim; Marcelo A. Nobrega; Erik Renström; Leif Groop; Ola Hansson

Genome-wide association studies have revealed >60 loci associated with type 2 diabetes (T2D), but the underlying causal variants and functional mechanisms remain largely elusive. Although variants in TCF7L2 confer the strongest risk of T2D among common variants by presumed effects on islet function, the molecular mechanisms are not yet well understood. Using RNA-sequencing, we have identified a TCF7L2-regulated transcriptional network responsible for its effect on insulin secretion in rodent and human pancreatic islets. ISL1 is a primary target of TCF7L2 and regulates proinsulin production and processing via MAFA, PDX1, NKX6.1, PCSK1, PCSK2 and SLC30A8, thereby providing evidence for a coordinated regulation of insulin production and processing. The risk T-allele of rs7903146 was associated with increased TCF7L2 expression, and decreased insulin content and secretion. Using gene expression profiles of 66 human pancreatic islets donors’, we also show that the identified TCF7L2-ISL1 transcriptional network is regulated in a genotype-dependent manner. Taken together, these results demonstrate that not only synthesis of proinsulin is regulated by TCF7L2 but also processing and possibly clearance of proinsulin and insulin. These multiple targets in key pathways may explain why TCF7L2 has emerged as the gene showing one of the strongest associations with T2D.


Nature Communications | 2014

Variation in genomic landscape of clear cell renal cell carcinoma across Europe

Ghislaine Scelo; Yasser Riazalhosseini; Liliana Greger; Louis Letourneau; Mar Gonzàlez-Porta; Magdalena B. Wozniak; Bourgey M; Patricia Harnden; Lars Egevad; Sharon Jackson; Mehran Karimzadeh; Madeleine Arseneault; Lepage P; Alexandre How-Kit; Antoine Daunay; Hélène Blanché; Tubacher E; Sehmoun J; Juris Viksna; Edgars Celms; Martins Opmanis; Andris Zarins; Naveen S. Vasudev; Seywright M; Behnoush Abedi-Ardekani; Carreira C; Peter Selby; J Cartledge; Byrnes G; Zavadil J

The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and genes encoding FAT cadherins. Furthermore, a large majority of patients from Romania have an unexpected high frequency of A:T>T:A transversions, consistent with exposure to aristolochic acid (AA). These results show that the processes underlying ccRCC tumorigenesis may vary in different populations and suggest that AA may be an important ccRCC carcinogen in Romania, a finding with major public health implications.


PLOS Genetics | 2015

Discovery and fine-mapping of glycaemic and obesity-related trait loci using high-density imputation

Momoko Horikoshi; Reedik Mӓgi; Martijn van de Bunt; Ida Surakka; Antti-Pekka Sarin; Anubha Mahajan; Letizia Marullo; Gudmar Thorleifsson; Sara Hӓgg; Jouke-Jan Hottenga; Claes Ladenvall; Janina S. Ried; Thomas W. Winkler; Sara M. Willems; Natalia Pervjakova; Tonu Esko; Marian Beekman; Christopher P. Nelson; Christina Willenborg; Steven Wiltshire; Teresa Ferreira; Juan Fernandez; Kyle J. Gaulton; Valgerdur Steinthorsdottir; Anders Hamsten; Patrik K. E. Magnusson; Gonneke Willemsen; Yuri Milaneschi; Neil R. Robertson; Christopher J. Groves

Reference panels from the 1000 Genomes (1000G) Project Consortium provide near complete coverage of common and low-frequency genetic variation with minor allele frequency ≥0.5% across European ancestry populations. Within the European Network for Genetic and Genomic Epidemiology (ENGAGE) Consortium, we have undertaken the first large-scale meta-analysis of genome-wide association studies (GWAS), supplemented by 1000G imputation, for four quantitative glycaemic and obesity-related traits, in up to 87,048 individuals of European ancestry. We identified two loci for body mass index (BMI) at genome-wide significance, and two for fasting glucose (FG), none of which has been previously reported in larger meta-analysis efforts to combine GWAS of European ancestry. Through conditional analysis, we also detected multiple distinct signals of association mapping to established loci for waist-hip ratio adjusted for BMI (RSPO3) and FG (GCK and G6PC2). The index variant for one association signal at the G6PC2 locus is a low-frequency coding allele, H177Y, which has recently been demonstrated to have a functional role in glucose regulation. Fine-mapping analyses revealed that the non-coding variants most likely to drive association signals at established and novel loci were enriched for overlap with enhancer elements, which for FG mapped to promoter and transcription factor binding sites in pancreatic islets, in particular. Our study demonstrates that 1000G imputation and genetic fine-mapping of common and low-frequency variant association signals at GWAS loci, integrated with genomic annotation in relevant tissues, can provide insight into the functional and regulatory mechanisms through which their effects on glycaemic and obesity-related traits are mediated.


Bioinformatics | 2009

A System for Information Management in BioMedical Studies –SIMBioMS

Maria Krestyaninova; Andris Zarins; Juris Viksna; Natalja Kurbatova; Peteris Rucevskis; Sudeshna Guha Neogi; Mike Gostev; Teemu Perheentupa; Juha Knuuttila; Amy Barrett; Ilkka Lappalainen; Johan Rung; Karlis Podnieks; Ugis Sarkans; Mark I. McCarthy; Alvis Brazma

Summary: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented. Availability: The source code, documentation and initialization scripts are available at http://simbioms.org. Contact: [email protected]; [email protected]

Collaboration


Dive into the Johan Rung's collaboration.

Top Co-Authors

Avatar

Alvis Brazma

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jing Su

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mar Gonzàlez-Porta

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Maria Krestyaninova

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Janina S. Ried

University of Washington

View shared research outputs
Researchain Logo
Decentralizing Knowledge