Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Szymon M. Kiełbasa is active.

Publication


Featured researches published by Szymon M. Kiełbasa.


Genome Research | 2011

Adaptive seeds tame genomic sequence comparison

Szymon M. Kiełbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin C. Frith

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.


Nature Biotechnology | 2013

Evaluation of methods for modeling transcription factor sequence specificity

Matthew T. Weirauch; Raquel Norel; Matti Annala; Yue Zhao; Todd Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Phaedra Agius; Aaron Arvey; Philipp Bucher; Curtis G. Callan; Cheng Wei Chang; Chien-Yu Chen; Yong-Syuan Chen; Yu-Wei Chu; Jan Grau; Ivo Grosse; Vidhya Jagannathan; Jens Keilwagen; Szymon M. Kiełbasa; Justin B. Kinney; Holger Klein; Miron B. Kursa; Harri Lähdesmäki; Kirsti Laurila; Chengwei Lei; Christina S. Leslie; Chaim Linhart

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a proteins DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro–derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.


PLOS ONE | 2009

Regulation of Clock-Controlled Genes in Mammals.

Katarzyna Bozek; Angela Relógio; Szymon M. Kiełbasa; Markus Heine; Christof Dame; Achim Kramer; Hanspeter Herzel

The complexity of tissue- and day time-specific regulation of thousands of clock-controlled genes (CCGs) suggests that many regulatory mechanisms contribute to the transcriptional output of the circadian clock. We aim to predict these mechanisms using a large scale promoter analysis of CCGs. Our study is based on a meta-analysis of DNA-array data from rodent tissues. We searched in the promoter regions of 2065 CCGs for highly overrepresented transcription factor binding sites. In order to compensate the relatively high GC-content of CCG promoters, a novel background model to avoid a bias towards GC-rich motifs was employed. We found that many of the transcription factors with overrepresented binding sites in CCG promoters exhibit themselves circadian rhythms. Among the predicted factors are known regulators such as CLOCK∶BMAL1, DBP, HLF, E4BP4, CREB, RORα and the recently described regulators HSF1, STAT3, SP1 and HNF-4α. As additional promising candidates of circadian transcriptional regulators PAX-4, C/EBP, EVI-1, IRF, E2F, AP-1, HIF-1 and NF-Y were identified. Moreover, GC-rich motifs (SP1, EGR, ZF5, AP-2, WT1, NRF-1) and AT-rich motifs (MEF-2, HMGIY, HNF-1, OCT-1) are significantly overrepresented in promoter regions of CCGs. Putative tissue-specific binding sites such as HNF-3 for liver, NKX2.5 for heart or Myogenin for skeletal muscle were found. The regulation of the erythropoietin (Epo) gene was analysed, which exhibits many binding sites for circadian regulators. We provide experimental evidence for its circadian regulated expression in the adult murine kidney. Basing on a comprehensive literature search we integrate our predictions into a regulatory network of core clock and clock-controlled genes. Our large scale analysis of the CCG promoters reveals the complexity and extensiveness of the circadian regulation in mammals. Results of this study point to connections of the circadian clock to other functional systems including metabolism, endocrine regulation and pharmacokinetics.


Nucleic Acids Research | 2005

Inferring combinatorial regulation of transcription in silico

Nils Blüthgen; Szymon M. Kiełbasa; Hanspeter Herzel

In this paper, we propose a functional view on the in silico prediction of transcriptional regulation. We present a method to predict biological functions regulated by a combinatorial interaction of transcription factors. Using a rigorous statistic, this approach intersects the presence of transcription factor binding sites in gene upstream sequences with Gene Ontology terms associated with these genes. We demonstrate that for the well-studied set of skeletal muscle-related transcription factors Myf-2, Mef and TEF, the correct functions are predicted. Furthermore, starting from the well-characterized promoter of a gene expressed upon lipopolysaccharide stimulation, we predict functional targets of this stimulus. These results are in excellent agreement with microarray data.


BMC Bioinformatics | 2005

Measuring similarities between transcription factor binding sites.

Szymon M. Kiełbasa; Didier Gonze; Hanspeter Herzel

BackgroundCollections of transcription factor binding profiles (Transfac, Jaspar) are essential to identify regulatory elements in DNA sequences. Subsets of highly similar profiles complicate large scale analysis of transcription factor binding sites.ResultsWe propose to identify and group similar profiles using two independent similarity measures: χ2 distances between position frequency matrices (PFMs) and correlation coefficients between position weight matrices (PWMs) scores.ConclusionWe show that these measures complement each other and allow to associate Jaspar and Transfac matrices. Clusters of highly similar matrices are identified and can be used to optimise the search for regulatory elements. Moreover, the application of the measures is illustrated by assigning E-box matrices of a SELEX experiment and of experimentally characterised binding sites of circadian clock genes to the Myc-Max cluster.


Nucleic Acids Research | 2004

HuSiDa—the human siRNA database: an open-access database for published functional siRNA sequences and technical details of efficient transfer into recipient cells

Matthias Truss; Maciej Swat; Szymon M. Kiełbasa; Reinhold Schäfer; Hanspeter Herzel; Christian Hagemeier

Small interfering RNAs (siRNAs) have become a standard tool in functional genomics. Once incorporated into the RNA-induced silencing complex (RISC), siRNAs mediate the specific recognition of corresponding target mRNAs and their cleavage. However, only a small fraction of randomly chosen siRNA sequences is able to induce efficient gene silencing. In common laboratory practice, successful RNA interference experiments typically require both, the labour and cost-intensive identification of an active siRNA sequence and the optimization of target cell line-specific procedures for optimal siRNA delivery. To optimize the design and performance of siRNA experiments, we have established the human siRNA database (HuSiDa). The database provides sequences of published functional siRNA molecules targeting human genes and important technical details of the corresponding gene silencing experiments, including the mode of siRNA generation, recipient cell lines, transfection reagents and procedures and direct links to published references (PubMed). The database can be accessed at http://www.human-siRNA-database.net. We used the siRNA sequence information stored in the database for scrutinizing published sequence selection parameters for efficient gene silencing.


PLOS Genetics | 2010

Identification of Y-box binding protein 1 as a core regulator of MEK/ERK pathway dependent gene signatures in colorectal cancer cells.

Karsten Jürchott; Ralf-Jürgen Kuban; Till Krech; Nils Blüthgen; Ulrike Stein; Wolfgang Walther; Christian Friese; Szymon M. Kiełbasa; Ute Ungethüm; Per-Eric Lund; Thomas Knösel; Wolfgang Kemmner; Markus Morkel; Johannes Fritzmann; Peter M. Schlag; Walter Birchmeier; Tammo Krueger; Silke Sperling; Christine Sers; Hans-Dieter Royer; Hanspeter Herzel; Reinhold Schäfer

Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties.


Bioinformatics | 2001

Combining frequency and positional information to predict transcription factor binding sites.

Szymon M. Kiełbasa; Jan O. Korbel; Dieter Beule; Hanspeter Herzel

MOTIVATION Even though a number of genome projects have been finished on the sequence level, still only a small proportion of DNA regulatory elements have been identified. Growing amounts of gene expression data provide the possibility of finding coregulated genes by clustering methods. By analysis of the promoter regions of those genes, rather weak signals of transcription factor binding sites may be detected. RESULTS We introduce the new algorithm ITB, an Integrated Tool for Box finding, which combines frequency and positional information to predict transcription factor binding sites in upstream regions of coregulated genes. Motifs are extracted by exhaustive analysis of regular expression-like patterns and by estimating probabilities of positional clusters of motifs. ITB detects consensus sequences of experimentally verified transcription factor binding sites of the yeast Saccharomyces cerevisiae. Moreover, a number of new binding site candidates with significant scores are predicted. Besides applying ITB on yeast upstream regions, the program is run on human promoter sequences. AVAILABILITY ITB is available upon request.


Trends in Genetics | 2009

Methylation and deamination of CpGs generate p53-binding sites on a genomic scale

Tomasz Zemojtel; Szymon M. Kiełbasa; Peter F. Arndt; Ho-Ryun Chung; Martin Vingron

The formation of transcription-factor-binding sites is an important evolutionary process. Here, we show that methylation and deamination of CpG dinucleotides generate in vivo p53-binding sites in numerous Alu elements and in non-repetitive DNA in a species-specific manner. In light of this, we propose that the deamination of methylated CpGs constitutes a universal mechanism for de novo generation of various transcription-factor-binding sites in Alus.


FEBS Journal | 2009

A systems biological approach suggests that transcriptional feedback regulation by dual‐specificity phosphatase 6 shapes extracellular signal‐related kinase activity in RAS‐transformed fibroblasts

Nils Blüthgen; Stefan Legewie; Szymon M. Kiełbasa; Anja Schramme; Oleg Tchernitsa; Jana Keil; Andrea Solf; Martin Vingron; Reinhold Schäfer; Hanspeter Herzel; Christine Sers

Mitogen‐activated protein kinase (MAPK) signaling determines crucial cell fate decisions in most cell types, and mediates cellular transformation in many types of cancer. The activity of MAPK is controlled by reversible phosphorylation, and the quantitative characteristics of MAPK activation determine the cellular response. Many systems biological studies have analyzed the activation kinetics and the dose–response behavior of the MAPK signaling pathway. Here we investigate how the pathway activity is controlled by transcriptional feedback loops. Initially, we predict that MAPK signaling regulates phosphatases, by integrating promoter sequence data and ontology‐based classification of gene function. From this, we deduce that MAPK signaling might be controlled by transcriptional negative feedback regulation via dual‐specificity phosphatases (DUSPs), and implement a mathematical model to further test this hypothesis. Using time‐resolved measurements of pathway activity and gene expression, we employ a model selection approach, and select DUSP6 as a highly likely candidate for shaping the activity of the MAPK pathway during cellular transformation caused by oncogenic RAS. Two predictions from the model were confirmed: first, feedback regulation requires that DUSP6 mRNA and protein are unstable; and second, the activation kinetics of MAPK are ultrasensitive. Taken together, an integrated systems biological approach reveals that transcriptional negative feedback controls the kinetics and the extent of MAPK activation under both physiological and pathological conditions.

Collaboration


Dive into the Szymon M. Kiełbasa's collaboration.

Top Co-Authors

Avatar

Hanspeter Herzel

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Dieter Beule

Humboldt University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Jan O. Korbel

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge