Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Botond Sipos is active.

Publication


Featured researches published by Botond Sipos.


Nature Methods | 2013

Systematic evaluation of spliced alignment programs for RNA-seq data

Pär G. Engström; Tamara Steijger; Botond Sipos; Gregory R. Grant; André Kahles; Gunnar Rätsch; Nick Goldman; Tim Hubbard; Jennifer Harrow; Roderic Guigó; Paul Bertone

High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.


Nature | 2013

Towards practical, high-capacity, low-maintenance information storage in synthesized DNA

Nick Goldman; Paul Bertone; Siyuan Chen; Christophe Dessimoz; Emily LeProust; Botond Sipos; Ewan Birney

Digital production, transmission and storage have revolutionized how we access and use information but have also made archiving an increasingly complex task that requires active, continuing maintenance of digital media. This challenge has focused some interest on DNA as an attractive target for information storage because of its capacity for high-density information encoding, longevity under easily achieved conditions and proven track record as an information bearer. Previous DNA-based information storage approaches have encoded only trivial amounts of information or were not amenable to scaling-up, and used no robust error-correction and lacked examination of their cost-efficiency for large-scale information archival. Here we describe a scalable method that can reliably store more information than has been handled before. We encoded computer files totalling 739 kilobytes of hard-disk storage and with an estimated Shannon information of 5.2 × 106 bits into a DNA code, synthesized this DNA, sequenced it and reconstructed the original files with 100% accuracy. Theoretical analysis indicates that our DNA-based storage scheme could be scaled far beyond current global information volumes and offers a realistic technology for large-scale, long-term and infrequently accessed digital archiving. In fact, current trends in technological advances are reducing DNA synthesis costs at a pace that should make our scheme cost-effective for sub-50-year archiving within a decade.


Nature Genetics | 2011

Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome

Cornelis A. Albers; Ana Cvejic; Rémi Favier; Evelien E Bouwmans; Marie-Christine Alessi; Paul Bertone; Gregory Jordan; Ross Kettleborough; Graham Kiddle; Myrto Kostadima; Randy J. Read; Botond Sipos; Suthesh Sivapalaratnam; Peter A. Smethurst; Jonathan Stephens; Katrin Voss; Alan T. Nurden; Augusto Rendon; Paquita Nurden; Willem H. Ouwehand

Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.


Nature Genetics | 2013

SMIM1 underlies the Vel blood group and influences red blood cell traits

Ana Cvejic; Lonneke Haer-Wigman; Jonathan Stephens; Myrto Kostadima; Peter A. Smethurst; Mattia Frontini; Emile van den Akker; Paul Bertone; Ewa Bielczyk-Maczyńska; Samantha Farrow; Rudolf S. N. Fehrmann; Alan Gray; Masja de Haas; Vincent G. Haver; Gregory Jordan; Juha Karjalainen; Hindrik Hd Kerstens; Graham Kiddle; Heather Lloyd-Jones; Malcolm Needs; Joyce Poole; Aicha Ait Soussan; Augusto Rendon; Klaus Rieneck; Jennifer Sambrook; Hein Schepers; Herman H. W. Silljé; Botond Sipos; Dorine W. Swinkels; Asif U. Tamuri

The blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1. A follow-up study showing that 59 of 64 Vel-negative individuals were homozygous for the same deletion and expression of the Vel antigen on SMIM1-transfected cells confirm SMIM1 as the gene underlying the Vel blood group. An expression quantitative trait locus (eQTL), the common SNP rs1175550 contributes to variable expression of the Vel antigen (P = 0.003) and influences the mean hemoglobin concentration of red blood cells (RBCs; P = 8.6 × 10−15). In vivo, zebrafish with smim1 knockdown showed a mild reduction in the number of RBCs, identifying SMIM1 as a new regulator of RBC formation. Our findings are of immediate relevance, as the homozygous presence of the deletion allows the unequivocal identification of Vel-negative blood donors.


PLOS Computational Biology | 2014

Phylogenetic quantification of intra-tumour heterogeneity.

Roland F. Schwarz; Anne Trinh; Botond Sipos; James D. Brenton; Nick Goldman; Florian Markowetz

Intra-tumour genetic heterogeneity is the result of ongoing evolutionary change within each cancer. The expansion of genetically distinct sub-clonal populations may explain the emergence of drug resistance, and if so, would have prognostic and predictive utility. However, methods for objectively quantifying tumour heterogeneity have been missing and are particularly difficult to establish in cancers where predominant copy number variation prevents accurate phylogenetic reconstruction owing to horizontal dependencies caused by long and cascading genomic rearrangements. To address these challenges, we present MEDICC, a method for phylogenetic reconstruction and heterogeneity quantification based on a Minimum Event Distance for Intra-tumour Copy-number Comparisons. Using a transducer-based pairwise comparison function, we determine optimal phasing of major and minor alleles, as well as evolutionary distances between samples, and are able to reconstruct ancestral genomes. Rigorous simulations and an extensive clinical study show the power of our method, which outperforms state-of-the-art competitors in reconstruction accuracy, and additionally allows unbiased numerical quantification of tumour heterogeneity. Accurate quantification and evolutionary inference are essential to understand the functional consequences of tumour heterogeneity. The MEDICC algorithms are independent of the experimental techniques used and are applicable to both next-generation sequencing and array CGH data.


Nature Methods | 2018

Highly parallel direct RNA sequencing on an array of nanopores

Daniel Ryan Garalde; Elizabeth A Snell; Daniel Jachimowicz; Botond Sipos; Joseph Hargreaves Lloyd; Mark Bruce; Nadia Pantic; Tigist Admassu; Phillip James; Anthony Warland; Michael Jordan; Jonah Ciccone; Sabrina Serra; Jemma Keenan; Samuel Martin; Luke McNeill; E. Jayne Wallace; Lakmal Jayasinghe; Christopher James Wright; Javier Blasco; Stephen Young; Denise Brocklebank; Sissel Juul; James Clarke; Andrew John Heron; Daniel J. Turner

Sequencing the RNA in a biological sample can unlock a wealth of information, including the identity of bacteria and viruses, the nuances of alternative splicing or the transcriptional state of organisms. However, current methods have limitations due to short read lengths and reverse transcription or amplification biases. Here we demonstrate nanopore direct RNA-seq, a highly parallel, real-time, single-molecule method that circumvents reverse transcription or amplification steps. This method yields full-length, strand-specific RNA sequences and enables the direct detection of nucleotide analogs in RNA.


PLOS Computational Biology | 2013

Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

Tjaart A. P. de Beer; Roman A. Laskowski; Sarah L. Parks; Botond Sipos; Nick Goldman; Janet M. Thornton

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.


BMC Bioinformatics | 2011

PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

Botond Sipos; Tim Massingham; Gregory Jordan; Nick Goldman

BackgroundThe Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity.ResultsPhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs.ConclusionsClose integration with R and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that PhyloSim will be useful to future studies involving simulated alignments.


PLOS ONE | 2012

An Improved Protocol for Sequencing of Repetitive Genomic Regions and Structural Variations Using Mutagenesis and Next Generation Sequencing

Botond Sipos; Tim Massingham; Adrian M. Stütz; Nick Goldman

The rise of Next Generation Sequencing (NGS) technologies has transformed de novo genome sequencing into an accessible research tool, but obtaining high quality eukaryotic genome assemblies remains a challenge, mostly due to the abundance of repetitive elements. These also make it difficult to study nucleotide polymorphism in repetitive regions, including certain types of structural variations. One solution proposed for resolving such regions is Sequence Assembly aided by Mutagenesis (SAM), which relies on the fact that introducing enough random mutations breaks the repetitive structure, making assembly possible. Sequencing many different mutated copies permits the sequence of the repetitive region to be inferred by consensus methods. However, this approach relies on molecular cloning in order to isolate and amplify individual mutant copies, making it hard to scale-up the approach for use in conjunction with high-throughput sequencing technologies. To address this problem, we propose NG-SAM, a modified version of the SAM protocol that relies on PCR and dilution steps only, coupled to a NGS workflow. NG-SAM therefore has the potential to be scaled-up, e.g. using emerging microfluidics technologies. We built a realistic simulation pipeline to study the feasibility of NG-SAM, and our results suggest that under appropriate experimental conditions the approach might be successfully put into practice. Moreover, our simulations suggest that NG-SAM is capable of reconstructing robustly a wide range of potential target sequences of varying lengths and repetitive structures.


bioRxiv | 2016

PASP - a whole-transcriptome poly(A) tail length determination assay for the Illumina platform

Botond Sipos; Adrian M. Stütz; Greg Slodkowicz; Tim Massingham; Jan O. Korbel; Nick Goldman

The poly(A) tail, co-transcriptionally added to most eukaryotic RNAs, plays an important role in post-transcriptional regulation through modulating mRNA stability and translational efficiency. The length of the poly(A) tail is dynamic, decreasing or increasing in response to various stimuli through the action of enzymatic complexes, and changes in tail length are exploited in regulatory pathways implicated in various biological processes. To date, assessment of poly(A) tail length has mostly relied on protocols targeting only a few transcripts. We present PASP (‘poly(A) tail sequencing protocol’), a whole-transcriptome approach to measure tail lengths — including a computational pipeline implementing all necessary analyses. PASP uses direct Illumina sequencing of cDNA fragments obtained through G-tailing of poly(A)-selected mRNA followed by fragmentation and reverse transcription. Analysis of reads corresponding to spike-in poly(A) tracts of known length indicated that mean tail lengths can be confidently measured, given sufficient coverage. We further explored the utility of our approach by comparing tail lengths estimated from wild type and Δccr4-1/pan2 mutant yeasts. The yeast whole-transcriptome tail length distributions showed high consistency between biological replicates, and the expected upward shift in tail lengths in the mutant samples was detected. This suggests that PASP is suitable for the assessment of global polyadenylation status in yeast. The correlation of per-transcript mean tail lengths between biological and technical replicates was low (higher between mutant samples). Both, however, reached high values after filtering for transcripts with greater coverage. We also compare our results with those of other methods. We identify a number of improvements that could be used in future PASP experiments and, based on our results, believe that direct sequencing of poly(A) tails can become the method of choice for studying polyadenylation using the Illumina platform

Collaboration


Dive into the Botond Sipos's collaboration.

Top Co-Authors

Avatar

Nick Goldman

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Paul Bertone

Medical Research Council

View shared research outputs
Top Co-Authors

Avatar

Tim Massingham

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Gregory Jordan

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Adrian M. Stütz

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Ana Cvejic

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Greg Slodkowicz

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge