Geir Kjetil Sandve | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Geir Kjetil Sandve is active.

Explore More

Publication

Featured researches published by Geir Kjetil Sandve.

PLOS Computational Biology | 2013

Ten Simple Rules for Reproducible Computational Research

Geir Kjetil Sandve; Anton Nekrutenko; R. James Taylor; Eivind Hovig

Replication is the cornerstone of a cumulative science [1]. However, new tools and technologies, massive amounts of data, interdisciplinary approaches, and the complexity of the questions being asked are complicating replication efforts, as are increased pressures on scientists to advance their research [2]. As full replication of studies on independently collected data is often not feasible, there has recently been a call for reproducible research as an attainable minimum standard for assessing the value of scientific claims [3]. This requires that papers in experimental science describe the results and provide a sufficiently clear protocol to allow successful repetition and extension of analyses based on original data [4]. The importance of replication and reproducibility has recently been exemplified through studies showing that scientific papers commonly leave out experimental details essential for reproduction [5], studies showing difficulties with replicating published experimental results [6], an increase in retracted papers [7], and through a high number of failing clinical trials [8], [9]. This has led to discussions on how individual researchers, institutions, funding bodies, and journals can establish routines that increase transparency and reproducibility. In order to foster such aspects, it has been suggested that the scientific community needs to develop a “culture of reproducibility” for computational science, and to require it for published claims [3]. We want to emphasize that reproducibility is not only a moral responsibility with respect to the scientific field, but that a lack of reproducibility can also be a burden for you as an individual researcher. As an example, a good practice of reproducibility is necessary in order to allow previously developed methodology to be effectively applied on new data, or to allow reuse of code and results for new projects. In other words, good habits of reproducibility may actually turn out to be a time-saver in the longer run. We further note that reproducibility is just as much about the habits that ensure reproducible research as the technologies that can make these processes efficient and realistic. Each of the following ten rules captures a specific aspect of reproducibility, and discusses what is needed in terms of information handling and tracking of procedures. If you are taking a bare-bones approach to bioinformatics analysis, i.e., running various custom scripts from the command line, you will probably need to handle each rule explicitly. If you are instead performing your analyses through an integrated framework (such as GenePattern [10], Galaxy [11], LONI pipeline [12], or Taverna [13]), the system may already provide full or partial support for most of the rules. What is needed on your part is then merely the knowledge of how to exploit these existing possibilities. In a pragmatic setting, with publication pressure and deadlines, one may face the need to make a trade-off between the ideals of reproducibility and the need to get the research out while it is still relevant. This trade-off becomes more important when considering that a large part of the analyses being tried out never end up yielding any results. However, frequently one will, with the wisdom of hindsight, contemplate the missed opportunity to ensure reproducibility, as it may already be too late to take the necessary notes from memory (or at least much more difficult than to do it while underway). We believe that the rewards of reproducibility will compensate for the risk of having spent valuable time developing an annotated catalog of analyses that turned out as blind alleys. As a minimal requirement, you should at least be able to reproduce the results yourself. This would satisfy the most basic requirements of sound research, allowing any substantial future questioning of the research to be met with a precise explanation. Although it may sound like a very weak requirement, even this level of reproducibility will often require a certain level of care in order to be met. There will for a given analysis be an exponential number of possible combinations of software versions, parameter values, pre-processing steps, and so on, meaning that a failure to take notes may make exact reproduction essentially impossible. With this basic level of reproducibility in place, there is much more that can be wished for. An obvious extension is to go from a level where you can reproduce results in case of a critical situation to a level where you can practically and routinely reuse your previous work and increase your productivity. A second extension is to ensure that peers have a practical possibility of reproducing your results, which can lead to increased trust in, interest for, and citations of your work [6], [14]. We here present ten simple rules for reproducibility of computational research. These rules can be at your disposal for whenever you want to make your research more accessible—be it for peers or for your future self.

Biology Direct | 2006

A survey of motif discovery methods in an integrated framework.

Geir Kjetil Sandve; Finn Drabløs

BackgroundThere has been a growing interest in computational discovery of regulatory elements, and a multitude of motif discovery methods have been proposed. Computational motif discovery has been used with some success in simple organisms like yeast. However, as we move to higher organisms with more complex genomes, more sensitive methods are needed. Several recent methods try to integrate additional sources of information, including microarray experiments (gene expression and ChlP-chip). There is also a growing awareness that regulatory elements work in combination, and that this combinatorial behavior must be modeled for successful motif discovery. However, the multitude of methods and approaches makes it difficult to get a good understanding of the current status of the field.ResultsThis paper presents a survey of methods for motif discovery in DNA, based on a structured and well defined framework that integrates all relevant elements. Existing methods are discussed according to this framework.ConclusionThe survey shows that although no single method takes all relevant elements into consideration, a very large number of different models treating the various elements separately have been tried. Very often the choices that have been made are not explicitly stated, making it difficult to compare different implementations. Also, the tests that have been used are often not comparable. Therefore, a stringent framework and improved test methods are needed to evaluate the different approaches in order to conclude which ones are most promising.Reviewers: This article was reviewed by Eugene V. Koonin, Philipp Bucher (nominated by Mikhail Gelfand) and Frank Eisenhaber.

Genome Biology | 2010

The Genomic HyperBrowser: inferential genomics at the sequence level

Geir Kjetil Sandve; Sveinung Gundersen; Halfdan Rydbeck; Ingrid K. Glad; Lars Holden; Marit Holden; Knut Liestøl; Trevor Clancy; Egil Ferkingstad; Morten Johansen; Vegard Nygaard; Eivind Tøstesen; Arnoldo Frigessi; Eivind Hovig

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.

BMC Bioinformatics | 2007

Improved benchmarks for computational motif discovery

Geir Kjetil Sandve; Osman Abul; Vegard Walseng; Finn Drabløs

BackgroundAn important step in annotation of sequenced genomes is the identification of transcription factor binding sites. More than a hundred different computational methods have been proposed, and it is difficult to make an informed choice. Therefore, robust assessment of motif discovery methods becomes important, both for validation of existing tools and for identification of promising directions for future research.ResultsWe use a machine learning perspective to analyze collections of transcription factors with known binding sites. Algorithms are presented for finding position weight matrices (PWMs), IUPAC-type motifs and mismatch motifs with optimal discrimination of binding sites from remaining sequence. We show that for many data sets in a recently proposed benchmark suite for motif discovery, none of the common motif models can accurately discriminate the binding sites from remaining sequence. This may obscure the distinction between the potential performance of the motif discovery tool itself versus the intrinsic complexity of the problem we are trying to solve. Synthetic data sets may avoid this problem, but we show on some previously proposed benchmarks that there may be a strong bias towards a presupposed motif model. We also propose a new approach to benchmark data set construction. This approach is based on collections of binding site fragments that are ranked according to the optimal level of discrimination achieved with our algorithms. This allows us to select subsets with specific properties. We present one benchmark suite with data sets that allow good discrimination between positive and negative instances with the common motif models. These data sets are suitable for evaluating algorithms for motif discovery that rely on these models. We present another benchmark suite where PWM, IUPAC and mismatch motif models are not able to discriminate reliably between positive and negative instances. This suite could be used for evaluating more powerful motif models.ConclusionOur improved benchmark suites have been designed to differentiate between the performance of motif discovery algorithms and the power of motif models. We provide a web server where users can download our benchmark suites, submit predictions and visualize scores on the benchmarks.

BMC Bioinformatics | 2008

Assessment of composite motif discovery methods

Kjetil Klepper; Geir Kjetil Sandve; Osman Abul; Jostein Johansen; Finn Drabløs

BackgroundComputational discovery of regulatory elements is an important area of bioinformatics research and more than a hundred motif discovery methods have been published. Traditionally, most of these methods have addressed the problem of single motif discovery – discovering binding motifs for individual transcription factors. In higher organisms, however, transcription factors usually act in combination with nearby bound factors to induce specific regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets of motifs bound by multiple cooperating transcription factors, so called composite motifs or cis-regulatory modules. Given the large number and diversity of methods available, independent assessment of methods becomes important. Although there have been several benchmark studies of single motif discovery, no similar studies have previously been conducted concerning composite motif discovery.ResultsWe have developed a benchmarking framework for composite motif discovery and used it to evaluate the performance of eight published module discovery tools. Benchmark datasets were constructed based on real genomic sequences containing experimentally verified regulatory modules, and the module discovery programs were asked to predict both the locations of these modules and to specify the single motifs involved. To aid the programs in their search, we provided position weight matrices corresponding to the binding motifs of the transcription factors involved. In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to test the response of programs to varying levels of noise.ConclusionAlthough some of the methods tested tended to score somewhat better than others overall, there were still large variations between individual datasets and no single method performed consistently better than the rest in all situations. The variation in performance on individual datasets also shows that the new benchmark datasets represents a suitable variety of challenges to most methods for module discovery.

BMC Medicine | 2013

Vitamin D receptor ChIP-seq in primary CD4+ cells: relationship to serum 25-hydroxyvitamin D levels and autoimmune disease.

Adam E. Handel; Geir Kjetil Sandve; Giulio Disanto; Antonio J. Berlanga-Taylor; Giuseppe Gallone; Heather Hanwell; Finn Drabløs; Gavin Giovannoni; George C. Ebers; Sreeram V. Ramagopalan

BackgroundVitamin D insufficiency has been implicated in autoimmunity. ChIP-seq experiments using immune cell lines have shown that vitamin D receptor (VDR) binding sites are enriched near regions of the genome associated with autoimmune diseases. We aimed to investigate VDR binding in primary CD4+ cells from healthy volunteers.MethodsWe extracted CD4+ cells from nine healthy volunteers. Each sample underwent VDR ChIP-seq. Our results were analyzed in relation to published ChIP-seq and RNA-seq data in the Genomic HyperBrowser. We used MEMEChIP for de novo motif discovery. 25-Hydroxyvitamin D levels were measured using liquid chromatography–tandem mass spectrometry and samples were divided into vitamin D sufficient (25(OH)D ≥75 nmol/L) and insufficient/deficient (25(OH)D <75 nmol/L) groups.ResultsWe found that the amount of VDR binding is correlated with the serum level of 25-hydroxyvitamin D (r = 0.92, P= 0.0005). In vivo VDR binding sites are enriched for autoimmune disease associated loci, especially when 25-hydroxyvitamin D levels (25(OH)D) were sufficient (25(OH)D ≥75: 3.13-fold, P<0.0001; 25(OH)D <75: 2.76-fold, P<0.0001; 25(OH)D ≥75 enrichment versus 25(OH)D <75 enrichment: P= 0.0002). VDR binding was also enriched near genes associated specifically with T-regulatory and T-helper cells in the 25(OH)D ≥75 group. MEME ChIP did not identify any VDR-like motifs underlying our VDR ChIP-seq peaks.ConclusionOur results show a direct correlation between in vivo 25-hydroxyvitamin D levels and the number of VDR binding sites, although our sample size is relatively small. Our study further implicates VDR binding as important in gene-environment interactions underlying the development of autoimmunity and provides a biological rationale for 25-hydroxyvitamin D sufficiency being based at 75 nmol/L. Our results also suggest that VDR binding in response to physiological levels of vitamin D occurs predominantly in a VDR motif-independent manner.

Human Molecular Genetics | 2012

Vitamin D receptor binding, chromatin states and association with multiple sclerosis

Giulio Disanto; Geir Kjetil Sandve; Antonio J. Berlanga-Taylor; Giammario Ragnedda; Julia M. Morahan; Corey T. Watson; Gavin Giovannoni; George C. Ebers; Sreeram V. Ramagopalan

Both genetic and environmental factors contribute to the aetiology of multiple sclerosis (MS). More than 50 genomic regions have been associated with MS susceptibility and vitamin D status also influences the risk of this complex disease. However, how these factors interact in disease causation is unclear. We aimed to investigate the relationship between vitamin D receptor (VDR) binding in lymphoblastoid cell lines (LCLs), chromatin states in LCLs and MS-associated genomic regions. Using the Genomic Hyperbrowser, we found that VDR-binding regions overlapped with active regulatory regions [active promoter (AP) and strong enhancer (SE)] in LCLs more than expected by chance [45.3-fold enrichment for SE (P < 2.0e−05) and 63.41-fold enrichment for AP (P < 2.0e−05)]. Approximately 77% of VDR regions were covered by either AP or SE elements. The overlap between VDR binding and regulatory elements was significantly greater in LCLs than in non-immune cells (P < 2.0e−05). VDR binding also occurred within MS regions more than expected by chance (3.7-fold enrichment, P < 2.0e−05). Furthermore, regions of joint overlap SE-VDR and AP-VDR were even more enriched within MS regions and near to several disease-associated genes. These findings provide relevant insights into how vitamin D influences the immune system and the risk of MS through VDR interactions with the chromatin state inside MS regions. Furthermore, the data provide additional evidence for an important role played by B cells in MS. Further analyses in other immune cell types and functional studies are warranted to fully elucidate the role of vitamin D in the immune system.

Nucleic Acids Research | 2013

The Genomic HyperBrowser: an analysis web server for genome-scale data

Geir Kjetil Sandve; Sveinung Gundersen; Morten Johansen; Ingrid K. Glad; Krishanthi Gunathasan; Lars Holden; Marit Holden; Knut Liestøl; Ståle Nygård; Vegard Nygaard; Jonas Paulsen; Halfdan Rydbeck; Kai Trengereid; Trevor Clancy; Finn Drabløs; Egil Ferkingstad; Matúš Kalaš; Tonje G. Lien; Morten Beck Rye; Arnoldo Frigessi; Eivind Hovig

The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.

Bioinformatics | 2014

HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization

Jonas Paulsen; Geir Kjetil Sandve; Sveinung Gundersen; Tonje G. Lien; Kai Trengereid; Eivind Hovig

Summary: Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models. Availability and implementation: HiBrowse is supported by all major browsers, and is freely available at http://hyperbrowser.uio.no/3d. Software is implemented in Python, and source code is available for download by following instructions on the main site. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

Bioinformatics | 2011

Sequential Monte Carlo multiple testing

Geir Kjetil Sandve; Egil Ferkingstad; Ståle Nygård

Motivation: In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high. Results: We here present MCFDR, a simple, novel algorithm for false discovery rate (FDR) modulated sequential Monte Carlo (MC) multiple hypothesis testing. The algorithm iterates between adding MC samples across tests and calculating intermediate FDR values for the collection of tests. MC sampling is stopped either by sequential MC or based on a threshold on FDR. An essential property of the algorithm is that it limits the total number of MC samples whatever the number of true null hypotheses. We show on both real and simulated data that the proposed algorithm provides large gains in computational efficiency. Availability: MCFDR is implemented in the Genomic HyperBrowser (http://hyperbrowser.uio.no/mcfdr), a web-based system for genome analysis. All input data and results are available and can be reproduced through a Galaxy Pages document at: http://hyperbrowser.uio.no/mcfdr/u/sandve/p/mcfdr. Contact: [email protected]

Explore More