Anthony Mathelier
University of Oslo
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anthony Mathelier.
Nucleic Acids Research | 2014
Anthony Mathelier; Xiaobei Zhao; Allen W. Zhang; François Parcy; Rebecca Worsley-Hunt; David J. Arenillas; Sorana Buchman; Chih-yu Chen; Alice Yi Chou; Hans Ienasescu; Jonathan S. Lim; Casper Shyr; Ge Tan; Michelle Zhou; Boris Lenhard; Albin Sandelin; Wyeth W. Wasserman
JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR—the JASPAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.
Nucleic Acids Research | 2016
Anthony Mathelier; Oriol Fornes; David J. Arenillas; Chih-yu Chen; Grégoire Denay; Jessica Lee; Wenqiang Shi; Casper Shyr; Ge Tan; Rebecca Worsley-Hunt; Allen W. Zhang; François Parcy; Boris Lenhard; Albin Sandelin; Wyeth W. Wasserman
JASPAR (http://jaspar.genereg.net) is an open-access database storing curated, non-redundant transcription factor (TF) binding profiles representing transcription factor binding preferences as position frequency matrices for multiple species in six taxonomic groups. For this 2016 release, we expanded the JASPAR CORE collection with 494 new TF binding profiles (315 in vertebrates, 11 in nematodes, 3 in insects, 1 in fungi and 164 in plants) and updated 59 profiles (58 in vertebrates and 1 in fungi). The introduced profiles represent an 83% expansion and 10% update when compared to the previous release. We updated the structural annotation of the TF DNA binding domains (DBDs) following a published hierarchical structural classification. In addition, we introduced 130 transcription factor flexible models trained on ChIP-seq data for vertebrates, which capture dinucleotide dependencies within TF binding sites. This new JASPAR release is accompanied by a new web tool to infer JASPAR TF binding profiles recognized by a given TF protein sequence. Moreover, we provide the users with a Ruby module complementing the JASPAR API to ease programmatic access and use of the JASPAR collection of profiles. Finally, we provide the JASPAR2016 R/Bioconductor data package with the data of this release.
PLOS Computational Biology | 2013
Anthony Mathelier; Wyeth W. Wasserman
Finding where transcription factors (TFs) bind to the DNA is of key importance to decipher gene regulation at a transcriptional level. Classically, computational prediction of TF binding sites (TFBSs) is based on basic position weight matrices (PWMs) which quantitatively score binding motifs based on the observed nucleotide patterns in a set of TFBSs for the corresponding TF. Such models make the strong assumption that each nucleotide participates independently in the corresponding DNA-protein interaction and do not account for flexible length motifs. We introduce transcription factor flexible models (TFFMs) to represent TF binding properties. Based on hidden Markov models, TFFMs are flexible, and can model both position interdependence within TFBSs and variable length motifs within a single dedicated framework. The availability of thousands of experimentally validated DNA-TF interaction sequences from ChIP-seq allows for the generation of models that perform as well as PWMs for stereotypical TFs and can improve performance for TFs with flexible binding characteristics. We present a new graphical representation of the motifs that convey properties of position interdependence. TFFMs have been assessed on ChIP-seq data sets coming from the ENCODE project, revealing that they can perform better than both PWMs and the dinucleotide weight matrix extension in discriminating ChIP-seq from background sequences. Under the assumption that ChIP-seq signal values are correlated with the affinity of the TF-DNA binding, we find that TFFM scores correlate with ChIP-seq peak signals. Moreover, using available TF-DNA affinity measurements for the Max TF, we demonstrate that TFFMs constructed from ChIP-seq data correlate with published experimentally measured DNA-binding affinities. Finally, TFFMs allow for the straightforward computation of an integrated TF occupancy score across a sequence. These results demonstrate the capacity of TFFMs to accurately model DNA-protein interactions, while providing a single unified framework suitable for the next generation of TFBS prediction.
Nucleic Acids Research | 2018
Aziz Khan; Oriol Fornes; Arnaud Stigliani; Marius Gheorghe; Jaime A Castro-Mondragon; Robin van der Lee; Adrien Bessy; Jeanne Cheneby; Shubhada Rajabhau Kulkarni; Ge Tan; Damir Baranasic; David J. Arenillas; Albin Sandelin; Klaas Vandepoele; Boris Lenhard; Benoit Ballester; Wyeth W. Wasserman; François Parcy; Anthony Mathelier
Abstract JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.
Trends in Genetics | 2015
Anthony Mathelier; Wenqiang Shi; Wyeth W. Wasserman
It has long been appreciated that variations in regulatory regions of genes can impact gene expression. With the advent of whole-genome sequencing (WGS), it has become possible to begin cataloging these noncoding variants. Evidence continues to accumulate linking clinical cases with cis-regulatory element disruption in a wide range of diseases. Identifying variants is becoming routine, but assessing their impact on regulation remains challenging. Bioinformatics approaches that identify variations functionally altering transcription factor (TF) binding are increasingly important for meeting this challenge. We present the current state of computational tools and resources for identifying the genomic regulatory components (cis-regulatory regions and TF binding sites, TFBSs) controlling gene transcriptional regulation. We review how such approaches can be used to interpret the potential disease causality of point mutations and small insertions or deletions. We hope this will motivate further the development of methods enabling the identification of etiological cis-regulatory variations.
Nature Biotechnology | 2017
Derek De Rie; Imad Abugessaisa; Tanvir Alam; Erik Arner; Peter Arner; Haitham Ashoor; Gaby Åström; Magda Babina; Nicolas Bertin; A. Maxwell Burroughs; Ailsa Carlisle; Carsten O. Daub; Michael Detmar; Ruslan Deviatiiarov; Alexandre Fort; Claudia Gebhard; Dan Goldowitz; Sven Guhl; Thomas Ha; Jayson Harshbarger; Akira Hasegawa; Kosuke Hashimoto; Meenhard Herlyn; Peter Heutink; Kelly J Hitchens; Chung Chau Hon; Edward Huang; Yuri Ishizu; Chieko Kai; Takeya Kasukawa
MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
Nucleic Acids Research | 2017
Marina Lizio; Jayson Harshbarger; Imad Abugessaisa; Shuei Noguchi; Atsushi Kondo; Jessica Severin; Christopher J. Mungall; David J. Arenillas; Anthony Mathelier; Yulia A. Medvedeva; Andreas Lennartsson; Finn Drabløs; Jordan A. Ramilowski; Owen J. L. Rackham; Julian Gough; Robin Andersson; Albin Sandelin; Hans Ienasescu; Hiromasa Ono; Hidemasa Bono; Yoshihide Hayashizaki; Piero Carninci; Alistair R. R. Forrest; Takeya Kasukawa; Hideya Kawaji
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
Genome Biology | 2015
Anthony Mathelier; Calvin Lefebvre; Allen W. Zhang; David J. Arenillas; Jiarui Ding; Wyeth W. Wasserman; Sohrab P. Shah
BackgroundWith the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations.ResultsWe characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer.ConclusionsOur study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
BioMed Research International | 2014
Anne Saumet; Anthony Mathelier; Charles-Henri Lecellier
MicroRNAs orchestrate the expression of the genome and impact many, if not all, cellular processes. Their deregulation is thus often causative of human malignancies, including cancers. Numerous studies have implicated microRNAs in the different steps of tumorigenesis including initiation, progression, metastasis, and resistance to chemo/radiotherapies. Thus, microRNAs constitute appealing targets for novel anticancer therapeutic strategies aimed at restoring their expression or function. As microRNAs are present in a variety of human cancer types, microRNA profiles can be used as tumor-specific signatures to detect various cancers (diagnosis), to predict their outcome (prognosis), and to monitor their treatment (theranosis). In this review, we present the different aspects of microRNA biology that make them remarkable molecules in the emerging field of personalized medicine against cancers and provide several examples of their industrial exploitation.
BMC Genomics | 2014
Rebecca Worsley Hunt; Anthony Mathelier; Luis del Peso; Wyeth W. Wasserman
BackgroundChromatin immunoprecipitation (ChIP) coupled to high-throughput sequencing (ChIP-Seq) techniques can reveal DNA regions bound by transcription factors (TF). Analysis of the ChIP-Seq regions is now a central component in gene regulation studies. The need remains strong for methods to improve the interpretation of ChIP-Seq data and the study of specific TF binding sites (TFBS).ResultsWe introduce a set of methods to improve the interpretation of ChIP-Seq data, including the inference of mediating TFs based on TFBS motif over-representation analysis and the subsequent study of spatial distribution of TFBSs. TFBS over-representation analysis applied to ChIP-Seq data is used to detect which TFBSs arise more frequently than expected by chance. Visualization of over-representation analysis results with new composition-bias plots reveals systematic bias in over-representation scores. We introduce the BiasAway background generating software to resolve the problem. A heuristic procedure based on topological motif enrichment relative to the ChIP-Seq peaks’ local maximums highlights peaks likely to be directly bound by a TF of interest. The results suggest that on average two-thirds of a ChIP-Seq dataset’s peaks are bound by the ChIP’d TF; the origin of the remaining peaks remaining undetermined. Additional visualization methods allow for the study of both inter-TFBS spatial relationships and motif-flanking sequence properties, as demonstrated in case studies for TBP and ZNF143/THAP11.ConclusionsTopological properties of TFBS within ChIP-Seq datasets can be harnessed to better interpret regulatory sequences. Using GC content corrected TFBS over-representation analysis, combined with visualization techniques and analysis of the topological distribution of TFBS, we can distinguish peaks likely to be directly bound by a TF. The new methods will empower researchers for exploration of gene regulation and TF binding.