Martin C. Frith | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin C. Frith is active.

Explore More

Publication

Featured researches published by Martin C. Frith.

Nucleic Acids Research | 2009

MEME Suite: tools for motif discovery and searching

Timothy L. Bailey; Mikael Bodén; Fabian A. Buske; Martin C. Frith; Charles E. Grant; Luca Clementi; Jingyuan Ren; Wilfred W. Li; William Stafford Noble

The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms—MAST, FIMO and GLAM2SCAN—allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm Tomtom. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and Tomtom), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.

Nature Biotechnology | 2005

Assessing computational tools for the discovery of transcription factor binding sites

Martin Tompa; Nan Li; Timothy L. Bailey; George M. Church; Bart De Moor; Eleazar Eskin; Alexander V. Favorov; Martin C. Frith; Yutao Fu; W. James Kent; Vsevolod J. Makeev; Andrei A. Mironov; William Stafford Noble; Giulio Pavesi; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher T. Workman; Chun Ye; Zhou Zhu

The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.

Nature Genetics | 2006

Genome-wide analysis of mammalian promoter architecture and evolution

Piero Carninci; Albin Sandelin; Boris Lenhard; Shintaro Katayama; Kazuro Shimokawa; Jasmina Ponjavic; Colin A. Semple; Martin S. Taylor; Pär G. Engström; Martin C. Frith; Alistair R. R. Forrest; Wynand B.L. Alkema; Sin Lam Tan; Charles Plessy; Rimantas Kodzius; Timothy Ravasi; Takeya Kasukawa; Shiro Fukuda; Mutsumi Kanamori-Katayama; Yayoi Kitazume; Hideya Kawaji; Chikatoshi Kai; Mari Nakamura; Hideaki Konno; Kenji Nakano; Salim Mottagui-Tabar; Peter Arner; Alessandra Chesi; Stefano Gustincich; Francesca Persichetti

Mammalian promoters can be separated into two classes, conserved TATA box–enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3′ UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.

Genome Research | 2011

Adaptive seeds tame genomic sequence comparison

Szymon M. Kiełbasa; Raymond Wan; Kengo Sato; Paul Horton; Martin C. Frith

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Nucleic Acids Research | 2003

Cluster-Buster: finding dense clusters of motifs in DNA sequences

Martin C. Frith; Michael C. Li; Zhiping Weng

The signals that determine activation and repression of specific genes in response to appropriate stimuli are one of the most important, but least understood, types of information encoded in genomic DNA. The nucleotide sequence patterns, or motifs, preferentially bound by various transcription factors have been collected in databases. However, these motifs appear to be individually too short and degenerate to enable detection of functional enhancer and silencer elements within a large genome. Several groups have proposed that dense clusters of motifs may diagnose regulatory regions more accurately. Cluster-Buster is the third incarnation of our software for finding clusters of pre-specified motifs in DNA sequences. We offer a Cluster-Buster web server at http://zlab.bu.edu/cluster-buster/.

PLOS Computational Biology | 2008

Discovering Sequence Motifs with Arbitrary Insertions and Deletions

Martin C. Frith; Neil F. W. Saunders; Bostjan Kobe; Timothy L. Bailey

Biology is encoded in molecular sequences: deciphering this encoding remains a grand scientific challenge. Functional regions of DNA, RNA, and protein sequences often exhibit characteristic but subtle motifs; thus, computational discovery of motifs in sequences is a fundamental and much-studied problem. However, most current algorithms do not allow for insertions or deletions (indels) within motifs, and the few that do have other limitations. We present a method, GLAM2 (Gapped Local Alignment of Motifs), for discovering motifs allowing indels in a fully general manner, and a companion method GLAM2SCAN for searching sequence databases using such motifs. glam2 is a generalization of the gapless Gibbs sampling algorithm. It re-discovers variable-width protein motifs from the PROSITE database significantly more accurately than the alternative methods PRATT and SAM-T2K. Furthermore, it usefully refines protein motifs from the ELM database: in some cases, the refined motifs make orders of magnitude fewer overpredictions than the original ELM regular expressions. GLAM2 performs respectably on the BAliBASE multiple alignment benchmark, and may be superior to leading multiple alignment methods for “motif-like” alignments with N- and C-terminal extensions. Finally, we demonstrate the use of GLAM2 to discover protein kinase substrate motifs and a gapped DNA motif for the LIM-only transcriptional regulatory complex: using GLAM2SCAN, we identify promising targets for the latter. GLAM2 is especially promising for short protein motifs, and it should improve our ability to identify the protein cleavage sites, interaction sites, post-translational modification attachment sites, etc., that underlie much of biology. It may be equally useful for arbitrarily gapped motifs in DNA and RNA, although fewer examples of such motifs are known at present. GLAM2 is public domain software, available for download at http://bioinformatics.org.au/glam2.

PLOS Genetics | 2006

Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

Norihiro Maeda; Takeya Kasukawa; Rieko Oyama; Julian Gough; Martin C. Frith; Pär G. Engström; Boris Lenhard; Rajith N. Aturaliya; Serge Batalov; Kirk W. Beisel; Colin F. Fletcher; Alistair R. R. Forrest; Masaaki Furuno; David E. Hill; Masayoshi Itoh; Mutsumi Kanamori-Katayama; Shintaro Katayama; Masaru Katoh; Tsugumi Kawashima; John Quackenbush; Timothy Ravasi; Brian Z. Ring; Kazuhiro Shibata; Koji Sugiura; Yoichi Takenaka; Rohan D. Teasdale; Christine A. Wells; Yunxia Zhu; Chikatoshi Kai; Jun Kawai

The international FANTOM consortium aims to produce a comprehensive picture of the mammalian transcriptome, based upon an extensive cDNA collection and functional annotation of full-length enriched cDNAs. The previous dataset, FANTOM2, comprised 60,770 full-length enriched cDNAs. Functional annotation revealed that this cDNA dataset contained only about half of the estimated number of mouse protein-coding genes, indicating that a number of cDNAs still remained to be collected and identified. To pursue the complete gene catalog that covers all predicted mouse genes, cloning and sequencing of full-length enriched cDNAs has been continued since FANTOM2. In FANTOM3, 42,031 newly isolated cDNAs were subjected to functional annotation, and the annotation of 4,347 FANTOM2 cDNAs was updated. To accomplish accurate functional annotation, we improved our automated annotation pipeline by introducing new coding sequence prediction programs and developed a Web-based annotation interface for simplifying the annotation procedures to reduce manual annotation errors. Automated coding sequence and function prediction was followed with manual curation and review by expert curators. A total of 102,801 full-length enriched mouse cDNAs were annotated. Out of 102,801 transcripts, 56,722 were functionally annotated as protein coding (including partial or truncated transcripts), providing to our knowledge the greatest current coverage of the mouse proteome by full-length cDNAs. The total number of distinct non-protein-coding transcripts increased to 34,030. The FANTOM3 annotation system, consisting of automated computational prediction, manual curation, and final expert curation, facilitated the comprehensive characterization of the mouse transcriptome, and could be applied to the transcriptomes of other species.

PLOS Genetics | 2006

The Abundance of Short Proteins in the Mammalian Proteome

Martin C. Frith; Alistair Raymond Russell Forrest; Ehsan Nourbakhsh; Ken C. Pang; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; Timothy L. Bailey; Sean M. Grimmond

Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a “dark matter” subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.

Current Biology | 2002

A ubiquitous and conserved signal for RNA localization in chordates

J. Nicholas Betley; Martin C. Frith; Joel H. Graber; Soheun Choo; James O. Deshler

During oogenesis in Xenopus laevis, several RNAs that localize to the vegetal cortex via one of three temporally defined pathways have been identified. Although individual mRNAs utilize only one pathway, there is functional overlap and apparent continuity between them, suggesting that common cis-acting sequences may exist. Because previous work with the Vg1 mRNA revealed that short nontandem repeats are important for localization, we developed a new computer program, called REPFIND, to expedite the identification of repeated motifs in other localized RNAs. Here we show that clusters of short CAC-containing motifs characterize the localization elements (LEs) of virtually all mRNAs localized to the vegetal cortex of Xenopus oocytes. A search for this signal in GenBank [9] resulted in the identification of new localized mRNAs, demonstrating the applicability of REPFIND to predict localized RNAs. CAC-rich LEs are also found in ascidians and other vertebrates, indicating that these cis regulatory elements are conserved in chordates. Interestingly, biochemical evidence shows that distinct CAC-containing motifs have different functions in the localization process. Thus, clusters of CAC-containing motifs are a ubiquitous signal for RNA localization and can signal localization in a variety of pathways through slight variations in sequence composition.

PLOS Genetics | 2006

Clusters of internally primed transcripts reveal novel long noncoding RNAs.

Masaaki Furuno; Ken C. Pang; Noriko Ninomiya; Shiro Fukuda; Martin C. Frith; Chikatoshi Kai; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki; John S. Mattick; Harukazu Suzuki

Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide microRNAs, several functional ncRNAs are orders of magnitude larger in size. Examples of such macro ncRNAs include Xist and Air, which in mouse are 18 and 108 kilobases (Kb), respectively. We surveyed the 102,801 FANTOM3 mouse cDNA clones and found that Air and Xist were present not as single, full-length transcripts but as a cluster of multiple, shorter cDNAs, which were unspliced, had little coding potential, and were most likely primed from internal adenine-rich regions within longer parental transcripts. We therefore conducted a genome-wide search for regional clusters of such cDNAs to find novel macro ncRNA candidates. Sixty-six regions were identified, each of which mapped outside known protein-coding loci and which had a mean length of 92 Kb. We detected several known long ncRNAs within these regions, supporting the basic rationale of our approach. In silico analysis showed that many regions had evidence of imprinting and/or antisense transcription. These regions were significantly associated with microRNAs and transcripts from the central nervous system. We selected eight novel regions for experimental validation by northern blot and RT-PCR and found that the majority represent previously unrecognized noncoding transcripts that are at least 10 Kb in size and predominantly localized in the nucleus. Taken together, the data not only identify multiple new ncRNAs but also suggest the existence of many more macro ncRNAs like Xist and Air.

Explore More