Sarah Sandmann
University of Münster
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sarah Sandmann.
Scientific Reports | 2017
Sarah Sandmann; A.O. de Graaf; Mohsen Karimi; B.A. van der Reijden; E. Hellström-Lindberg; Joop H. Jansen; Martin Dugas
Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.
Nature Communications | 2017
Pedro da Silva-Coelho; Kenichi Yoshida; Theresia N. Koorenhof-Scheele; Ruth Knops; Louis van de Locht; Aniek O. de Graaf; Marion Massop; Sarah Sandmann; Martin Dugas; Marian Stevens-Kroef; Jaroslav Cermak; Yuichi Shiraishi; Kenichi Chiba; Hiroko Tanaka; Satoru Miyano; Theo de Witte; N.M.A. Blijlevens; Petra Muus; Gerwin Huls; Bert A. van der Reijden; Seishi Ogawa; Joop H. Jansen
Cancer development is a dynamic process during which the successive accumulation of mutations results in cells with increasingly malignant characteristics. Here, we show the clonal evolution pattern in myelodysplastic syndrome (MDS) patients receiving supportive care, with or without lenalidomide (follow-up 2.5–11 years). Whole-exome and targeted deep sequencing at multiple time points during the disease course reveals that both linear and branched evolutionary patterns occur with and without disease-modifying treatment. The application of disease-modifying therapy may create an evolutionary bottleneck after which more complex MDS, but also unrelated clones of haematopoietic cells, may emerge. In addition, subclones that acquired an additional mutation associated with treatment resistance (TP53) or disease progression (NRAS, KRAS) may be detected months before clinical changes become apparent. Monitoring the genetic landscape during the disease may help to guide treatment decisions.
Journal of the American Medical Informatics Association | 2018
Julian Varghese; Maren Kleine; Sophia Isabella Gessner; Sarah Sandmann; Martin Dugas
Objectives To systematically classify the clinical impact of computerized clinical decision support systems (CDSSs) in inpatient care. Materials and Methods Medline, Cochrane Trials, and Cochrane Reviews were searched for CDSS studies that assessed patient outcomes in inpatient settings. For each study, 2 physicians independently mapped patient outcome effects to a predefined medical effect score to assess the clinical impact of reported outcome effects. Disagreements were measured by using weighted kappa and solved by consensus. An example set of promising disease entities was generated based on medical effect scores and risk of bias assessment. To summarize technical characteristics of the systems, reported input variables and algorithm types were extracted as well. Results Seventy studies were included. Five (7%) reported reduced mortality, 16 (23%) reduced life-threatening events, and 28 (40%) reduced non-life-threatening events, 20 (29%) had no significant impact on patient outcomes, and 1 showed a negative effect (weighted κ: 0.72, P < .001). Six of 24 disease entity settings showed high effect scores with medium or low risk of bias: blood glucose management, blood transfusion management, physiologic deterioration prevention, pressure ulcer prevention, acute kidney injury prevention, and venous thromboembolism prophylaxis. Most of the implemented algorithms (72%) were rule-based. Reported input variables are shared as standardized models on a metadata repository. Discussion and Conclusion Most of the included CDSS studies were associated with positive patient outcomes effects but with substantial differences regarding the clinical impact. A subset of 6 disease entities could be filtered in which CDSS should be given special consideration at sites where computer-assisted decision-making is deemed to be underutilized. Registration number on PROSPERO: CRD42016049946.
PLOS ONE | 2017
Sarah Sandmann; A.O. de Graaf; Mohsen Karimi; B.A. van der Reijden; Eva Hellström-Lindberg; Joop H. Jansen; Martin Dugas
Background There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms. Methods We considered three common next-generation sequencers—Roche 454, Ion Torrent PGM and Illumina NextSeq—and applied standard as well as optimized variant calling pipelines. Optimization was achieved by combining information of 23 diverse parameters characterizing the reported variants and generating individually calibrated generalized linear models. Models were calibrated using amplicon-based targeted sequencing data (19 genes, 28,775 bp) from seven to 12 myelodysplastic syndrome patients. Evaluation of the optimized pipelines and platforms was performed using sequencing data from three additional myelodysplastic syndrome patients. Results Using standard analysis methods, true mutations were missed and the obtained results contained many artifacts—no matter which platform was considered. Analysis of the parameters characterizing the true and false positive calls revealed significant platform- and variant specific differences. Application of optimized variant calling pipelines considerably improved results. 76% of all false positive single nucleotide variants and 97% of all false positive indels could be filtered out. Positive predictive values could be increased by factors of 1.07 to 1.27 in case of single nucleotide variant calling and by factors of 3.33 to 53.87 in case of indel calling. Application of the optimized variant calling pipelines leads to comparable results for all next-generation sequencing platforms analyzed. However, regarding clinical diagnostics it needs to be considered that even the optimized results still contained false positive as well as false negative calls.
Bioinformatics | 2018
Sarah Sandmann; Mohsen Karimi; Aniek O. de Graaf; Christian Rohde; Stefanie Göllner; Julian Varghese; Jan Ernsting; Gunilla Walldin; Bert A. van der Reijden; Carsten Müller-Tidow; Luca Malcovati; Eva Hellström-Lindberg; Joop H. Jansen; Martin Dugas
Motivation: The application of next‐generation sequencing in research and particularly in clinical routine requires valid variant calling results. However, evaluation of several commonly used tools has pointed out that not a single tool meets this requirement. False positive as well as false negative calls necessitate additional experiments and extensive manual work. Intelligent combination and output filtration of different tools could significantly improve the current situation. Results: We developed appreci8, an automatic variant calling pipeline for calling single nucleotide variants and short indels by combining and filtering the output of eight open‐source variant calling tools, based on a novel artifact‐ and polymorphism score. Appreci8 was trained on two data sets from patients with myelodysplastic syndrome, covering 165 Illumina samples. Subsequently, appreci8s performance was tested on five independent data sets, covering 513 samples. Variation in sequencing platform, target region and disease entity was considered. All calls were validated by re‐sequencing on the same platform, a different platform or expert‐based review. Sensitivity of appreci8 ranged between 0.93 and 1.00, while positive predictive value ranged between 0.65 and 1.00. In all cases, appreci8 showed superior performance compared to any evaluated alternative approach. Availability and implementation: Appreci8 is freely available at https://hub.docker.com/r/wwuimi/appreci8/. Sequencing data (BAM files) of the 678 patients analyzed with appreci8 have been deposited into the NCBI Sequence Read Archive (BioProjectID: 388411; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA388411). Supplementary information: Supplementary data are available at Bioinformatics online.
Journal of Medical Internet Research | 2017
Julian Varghese; Sarah Sandmann; Martin Dugas
Background Medical coding is essential for standardized communication and integration of clinical data. The Unified Medical Language System by the National Library of Medicine is the largest clinical terminology system for medical coders and Natural Language Processing tools. However, the abundance of ambiguous codes leads to low rates of uniform coding among different coders. Objective The objective of our study was to measure uniform coding among different medical experts in terms of interrater reliability and analyze the effect on interrater reliability using an expert- and Web-based code suggestion system. Methods We conducted a quasi-experimental study in which 6 medical experts coded 602 medical items from structured quality assurance forms or free-text eligibility criteria of 20 different clinical trials. The medical item content was selected on the basis of mortality-leading diseases according to World Health Organization data. The intervention comprised using a semiautomatic code suggestion tool that is linked to a European information infrastructure providing a large medical text corpus of >300,000 medical form items with expert-assigned semantic codes. Krippendorff alpha (Kalpha) with bootstrap analysis was used for the interrater reliability analysis, and coding times were measured before and after the intervention. Results The intervention improved interrater reliability in structured quality assurance form items (from Kalpha=0.50, 95% CI 0.43-0.57 to Kalpha=0.62 95% CI 0.55-0.69) and free-text eligibility criteria (from Kalpha=0.19, 95% CI 0.14-0.24 to Kalpha=0.43, 95% CI 0.37-0.50) while preserving or slightly reducing the mean coding time per item for all 6 coders. Regardless of the intervention, precoordination and structured items were associated with significantly high interrater reliability, but the proportion of items that were precoordinated significantly increased after intervention (eligibility criteria: OR 4.92, 95% CI 2.78-8.72; quality assurance: OR 1.96, 95% CI 1.19-3.25). Conclusions The Web-based code suggestion mechanism improved interrater reliability toward moderate or even substantial intercoder agreement. Precoordination and the use of structured versus free-text data elements are key drivers of higher interrater reliability.
BMC Bioinformatics | 2017
Sarah Sandmann; Aniek O. de Graaf; Martin Dugas
AbstractBackgroundDeriving valid variant calling results from raw next-generation sequencing data is a particularly challenging task, especially with respect to clinical diagnostics and personalized medicine. However, when using classic variant calling software, the user usually obtains nothing more than a list of variants that pass the corresponding caller’s internal filters. Any expected mutations (e.g. hotspot mutations), that have not been called by the software, need to be investigated manually.ResultsBBCAnalyzer (Bases By CIGAR Analyzer) provides a novel visual approach to facilitate this step of time-consuming, manual inspection of common mutation sites. BBCAnalyzer is able to visualize base counts at predefined positions or regions in any sequence alignment data that are available as BAM files. Thereby, the tool provides a straightforward solution for evaluating any list of expected mutations like hotspot mutations, or even whole regions of interest. In addition to an ordinary textual report, BBCAnalyzer reports highly customizable plots. Information on the counted number of bases, the reference bases, known mutations or polymorphisms, called mutations and base qualities is summarized in a single plot. By uniting this information in a graphical way, the user may easily decide on a variant being present or not – completely independent of any internal filters or frequency thresholds.ConclusionsBBCAnalyzer provides a unique, novel approach to facilitate variant calling where classical tools frequently fail to call. The R package is freely available at http://bioconductor.org. The local web application is available at Additional file 2. A documentation of the R package (Additional file 1) as well as the web application (Additional file 2) with detailed descriptions, examples of all input- and output elements, exemplary code as well as exemplary data are included. A video demonstrates the exemplary usage of the local web application (Additional file 3). Additional file 3: Supplement_3. Video demonstrating the exemplary usage of the web application “BBCAnalyzer”. (MP4 11571 kb)
F1000Research | 2018
Marta Interlandi; Sarah Sandmann; Inaki Soto Rey; Michael Storck; Marcel Trautmann; Wolfgang Hartmann; Martin Dugas
MedInfo | 2017
Julian Varghese; Maren Kleine; Sophia Isabella Gessner; Sarah Sandmann; Martin Dugas
F1000Research | 2017
Sarah Sandmann; Aniek O. de Graaf; Mohsen Karimi; Bert A. van der Reijden; Eva Hellström-Lindberg; Joop H. Jansen; Martin Dugas