Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Diana Domanska is active.

Publication


Featured researches published by Diana Domanska.


GigaScience | 2017

GSuite HyperBrowser: Integrative analysis of dataset collections across the genome and epigenome

Boris Simovski; Daniel Vodák; Sveinung Gundersen; Diana Domanska; Abdulrahman Azab; Lars Holden; Marit Holden; Ivar Grytten; Knut Dagestad Rand; Finn Drabløs; Morten Johansen; Antonio Mora; Christin Lund-Andersen; Bastian Fromm; Ragnhild Eskeland; Odd S. Gabrielsen; Egil Ferkingstad; Sigve Nakken; Mads Bengtsen; Hildur Sif Thorarensen; Johannes Andreas Akse; Ingrid K. Glad; Eivind Hovig; Geir Kjetil Sandve

Abstract Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.


bioRxiv | 2018

MirGeneDB2.0: the curated microRNA Gene Database

Bastian Fromm; Diana Domanska; Michael Hackenberg; Anthony Mathelier; Eirik Høye; Morten Johansen; Eivind Hovig; Kjersti Flatmark; Kevin J. Peterson

Non-coding RNAs (ncRNA), a significant part of the increasingly popular dark matter of the human genome, have gained substantial attention due to their involvement in animal development and human disorders such as cardiovascular diseases and cancer. Although many different types of regulatory ncRNAs have been discovered over the last 25 years, microRNAs (miRNAs) are unique within these as they are the only class of ncRNAs with individual genes sequentially conserved across the animal kingdom. Because of the conserved roles miRNAs play in establishing robustness of gene regulatory networks across Metazoa, it is important that homologous miRNAs in different species are correctly identified, annotated, and named using consistent criteria against the backdrop of numerous other types of coding and non-coding RNA fragments.Small non-coding RNAs have gained substantial attention due to their roles in animal development and human disorders. Among them, microRNAs are unique because individual gene sequences are conserved across the animal kingdom. In addition, unique and mechanistically well understood features can clearly distinguish bona fide miRNAs from the myriad other small RNAs generated by cells. However, making this separation is not a common practice and, thus, not surprisingly, the heterogeneous quality of available miRNA complements has become a major concern in microRNA research. We addressed this by extensively expanding our curated microRNA gene database MirGeneDB to 45 organisms that represent the full taxonomic breadth of Metazoa. By consistently annotating and naming more than 10,900 microRNA genes in these organisms, we show that previous microRNA annotations contained not only many false positives, but surprisingly lacked more than 2,100 bona fide microRNAs. Indeed, curated microRNA complements of closely related organisms are very similar and can be used to reconstruct Metazoan evolution. MirGeneDB represents a robust platform for microRNA-based research, providing deeper and more significant insights into the biology and evolution of miRNAs but also biomedical and biomarker research. MirGeneDB is publicly and freely available at http://mirgenedb.org/.


Scientific Reports | 2017

Uracil Accumulation and Mutagenesis Dominated by Cytosine Deamination in CpG Dinucleotides in Mice Lacking UNG and SMUG1

Lene Alsøe; Antonio Sarno; Sergio Carracedo; Diana Domanska; Felix A. Dingler; Lisa Lirussi; Tanima SenGupta; Nuriye Basdag Tekin; Laure Jobert; Ludmil B. Alexandrov; Anastasia Galashevskaya; Cristina Rada; Geir Kjetil Sandve; Torbjørn Rognes; Hans E. Krokan; Hilde Nilsen

Both a DNA lesion and an intermediate for antibody maturation, uracil is primarily processed by base excision repair (BER), either initiated by uracil-DNA glycosylase (UNG) or by single-strand selective monofunctional uracil DNA glycosylase (SMUG1). The relative in vivo contributions of each glycosylase remain elusive. To assess the impact of SMUG1 deficiency, we measured uracil and 5-hydroxymethyluracil, another SMUG1 substrate, in Smug1−/− mice. We found that 5-hydroxymethyluracil accumulated in Smug1−/− tissues and correlated with 5-hydroxymethylcytosine levels. The highest increase was found in brain, which contained about 26-fold higher genomic 5-hydroxymethyluracil levels than the wild type. Smug1−/− mice did not accumulate uracil in their genome and Ung−/− mice showed slightly elevated uracil levels. Contrastingly, Ung−/−Smug1−/− mice showed a synergistic increase in uracil levels with up to 25-fold higher uracil levels than wild type. Whole genome sequencing of UNG/SMUG1-deficient tumours revealed that combined UNG and SMUG1 deficiency leads to the accumulation of mutations, primarily C to T transitions within CpG sequences. This unexpected sequence bias suggests that CpG dinucleotides are intrinsically more mutation prone. In conclusion, we showed that SMUG1 efficiently prevent genomic uracil accumulation, even in the presence of UNG, and identified mutational signatures associated with combined UNG and SMUG1 deficiency.


BMC Bioinformatics | 2017

The rainfall plot: its motivation, characteristics and pitfalls

Diana Domanska; Daniel Vodák; Christin Lund-Andersen; Stefania Salvatore; Eivind Hovig; Geir Kjetil Sandve

BackgroundA visualization referred to as rainfall plot has recently gained popularity in genome data analysis. The plot is mostly used for illustrating the distribution of somatic cancer mutations along a reference genome, typically aiming to identify mutation hotspots. In general terms, the rainfall plot can be seen as a scatter plot showing the location of events on the x-axis versus the distance between consecutive events on the y-axis. Despite its frequent use, the motivation for applying this particular visualization and the appropriateness of its usage have never been critically addressed in detail.ResultsWe show that the rainfall plot allows visual detection even for events occurring at high frequency over very short distances. In addition, event clustering at multiple scales may be detected as distinct horizontal bands in rainfall plots. At the same time, due to the limited size of standard figures, rainfall plots might suffer from inability to distinguish overlapping events, especially when multiple datasets are plotted in the same figure. We demonstrate the consequences of plot congestion, which results in obscured visual data interpretations.ConclusionsThis work provides the first comprehensive survey of the characteristics and proper usage of rainfall plots. We find that the rainfall plot is able to convey a large amount of information without any need for parameterization or tuning. However, we also demonstrate how plot congestion and the use of a logarithmic y-axis may result in obscured visual data interpretations. To aid the productive utilization of rainfall plots, we demonstrate their characteristics and potential pitfalls using both simulated and real data, and provide a set of practical guidelines for their proper interpretation and usage.


cluster computing and the grid | 2016

Software Provisioning Inside a Secure Environment as Docker Containers Using Stroll File-System

Abdulrahman Azab; Diana Domanska

TSD (Tjenester for Sensitive Data), is an isolated infrastructure for storing and processing sensitive research data, e.g. human patient genomics data. Due to the isolation of the TSD, it is not possible to install software in the traditional fashion. Docker containers is a platform implementing lightweight virtualization technology for applying the build-once-run-anyware approach in software packaging and sharing. This paper describes our experience at USIT (The University Centre of Information Technology) at the University of Oslo With Docker containers as a solution for installing and running software packages that require downloading of dependencies and binaries during the installation, inside a secure isolated infrastructure. Using Docker containers made it possible to package software packages as Docker images and run them smoothly inside our secure system, TSD. The paper describes Docker as a technology, its benefits and weaknesses in terms of security, demonstrates our experience with a use case for installing and running the Galaxy bioinformatics portal as a Docker container inside the TSD, and investigates the use of Stroll file-system as a proxy between Galaxy portal and the HPC cluster.


Ecological Informatics | 2016

Handling high-dimensional data in air pollution forecasting tasks

Diana Domanska; Szymon Łukasik

In the paper methods aimed at handling high-dimensional weather forecasts data used to predict the concentrations of PM10, PM2.5, SO2, NO, CO and O3 are being proposed. The procedure employed to predict pollution normally requires historical data samples for a large number of points in time — particularly weather forecast data, actual weather data and pollution data. Likewise, it typically involves using numerous features related to atmospheric conditions. Consequently the analysis of such datasets to generate accurate forecasts becomes very cumbersome task. The paper examines a variety of unsupervised dimensionality reduction methods aimed at obtaining compact yet informative set of features. As an alternative, approach using fractional distances for data analysis tasks is being considered as well. Both strategies were evaluated on real-world data obtained from the Institute of Meteorology and Water Management in Katowice (Poland), with extended Air Pollution Forecast Model (e-APFM) being used as underlying prediction tool. It was found that employing fractional distance as a dissimilarity measure ensures the best accuracy of forecasting. Satisfactory results can be also obtained with Isomap, Landmark Isomap and Factor Analysis as dimensionality reduction techniques. These methods can be also used to formulate universal mapping, ready-to-use for data gathered at different geographical areas.


bioRxiv | 2018

Mind your gaps: Overlooking assembly gaps confounds statistical testing in genome analysis

Diana Domanska; Chakravarthi Kanduri; Boris Simovski; Geir Kjetil Sandve

Background The difficulties associated with sequencing and assembling some regions of the DNA sequence result in gaps in the reference genomes that are typically represented as stretches of Ns. Although the presence of assembly gaps causes a slight reduction in the mapping rate in many experimental settings, that does not invalidate the typical statistical testing comparing read count distributions across experimental conditions. However, we hypothesize that not handling assembly gaps in the null model may confound statistical testing of co-localization of genomic features. Results First, we performed a series of explorative analyses to understand whether and how the public genomic tracks intersect the assembly gaps track (hg19). The findings rightly confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps and the intersection was observed only at the beginning and end regions of the assembly gaps rather than covering the whole gap sizes. Further, we simulated a set of query and reference genomic tracks in a way that nullified any dependence between them to test our hypothesis that not avoiding assembly gaps in the null model would result in spurious inflation of statistical significance. We then contrasted the distributions of test statistics and p-values of Monte Carlo simulation-based permutation tests that either avoided or not avoided assembly gaps in the null model when testing for significant co-localization between a pair of query and reference tracks. We observed that the statistical tests that did not account for the assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribu tion of p-values that is shifted to the left (leading to inflated significance). Conclusion Our results shows that not accounting for assembly gaps in statistical testing of co-localization analysis may lead to false positives and over-optimistic findings.


Nucleic Acids Research | 2018

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features

Boris Simovski; Chakravarthi Kanduri; Sveinung Gundersen; Dmytro Titov; Diana Domanska; Christoph Bock; Lara Bossini-Castillo; Maria Chikina; Alexander V. Favorov; Ryan M. Layer; Andrey A. Mironov; Aaron R. Quinlan; Nathan C. Sheffield; Gosia Trynka; Geir Kjetil Sandve

Abstract Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


PLOS ONE | 2017

Complex patterns of concomitant medication use: A study among Norwegian women using paracetamol during pregnancy

Stefania Salvatore; Diana Domanska; Mollie Wood; Hedvig Nordeng; Geir Kjetil Sandve

Background Studies on medication safety in pregnancy often rely on an oversimplification of medication use into exposed or non-exposed, without considering intensity and timing of use in pregnancy, or concomitant medication use. This study uses paracetamol in pregnancy as the motivating example to introduce a method of clustering medication exposures longitudinally throughout pregnancy. The aim of this study was to use hierarchical cluster analysis (HCA) to better identify clusters of medication exposure throughout pregnancy. Methods Data from the Norwegian Mother and Child Cohort Study was used to identify subclasses of women using paracetamol during pregnancy. HCA with customized distance measure was used to identify clusters of medication exposures in pregnancy among children at 18 months. Results The pregnancies in the study (N = 9 778) were grouped in 5 different clusters depending on their medication exposure profile throughout pregnancy. Conclusion Using HCA, we identified and described profiles of women exposed to different medications in combination with paracetamol during pregnancy. Identifying these clusters allows researchers to define exposure in ways that better reflects real-world medication usage patterns. This method could be extended to other medications and used as pre-analysis for identifying risks associated with different profiles of exposure.


Genome Biology | 2017

Genome build information is an essential part of genomic track files

Chakravarthi Kanduri; Diana Domanska; Eivind Hovig; Geir Kjetil Sandve

Genomic locations are represented as coordinates on a specific genome build version, but the build information is frequently missing when coordinates are provided. We show that this information is essential to correctly interpret and analyse the genomic intervals contained in genomic track files. Although not a substitute for best practices, we also provide a tool to predict the genome build version of genomic track files.

Collaboration


Dive into the Diana Domanska's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Eivind Hovig

Oslo University Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Bastian Fromm

Oslo University Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel Vodák

Oslo University Hospital

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge