Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nigel J. Martin is active.

Publication


Featured researches published by Nigel J. Martin.


Nucleic Acids Research | 2003

The CATH database: an extended protein family resource for structural and functional genomics

Frances M. G. Pearl; C. F. Bennett; James E. Bray; Andrew P. Harrison; Nigel J. Martin; Adrian J. Shepherd; Ian Sillitoe; Janet M. Thornton; Christine A. Orengo

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.


Nature Communications | 2014

A robust SNP barcode for typing Mycobacterium tuberculosis complex strains

Francesc Coll; Ruth McNerney; José Afonso Guerra-Assunção; Judith R. Glynn; João Perdigão; Miguel Viveiros; Isabel Portugal; Arnab Pain; Nigel J. Martin; Taane G. Clark

Strain-specific genomic diversity in the Mycobacterium tuberculosis complex (MTBC) is an important factor in pathogenesis that may affect virulence, transmissibility, host response and emergence of drug resistance. Several systems have been proposed to classify MTBC strains into distinct lineages and families. Here, we investigate single-nucleotide polymorphisms (SNPs) as robust (stable) markers of genetic variation for phylogenetic analysis. We identify ~92k SNP across a global collection of 1,601 genomes. The SNP-based phylogeny is consistent with the gold-standard regions of difference (RD) classification system. Of the ~7k strain-specific SNPs identified, 62 markers are proposed to discriminate known circulating strains. This SNP-based barcode is the first to cover all main lineages, and classifies a greater number of sublineages than current alternatives. It may be used to classify clinical isolates to evaluate tools to control the disease, including therapeutics and vaccines whose effectiveness may vary by strain type.


Genome Biology | 2004

Consensus clustering and functional interpretation of gene-expression data

Stephen Swift; Allan Tucker; Veronica Vinciotti; Nigel J. Martin; Christine A. Orengo; Xiaohui Liu; Paul Kellam

Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFκB and the unfolded protein response in certain B-cell lymphomas.


Genome Medicine | 2015

Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences

Francesc Coll; Ruth McNerney; Mark D. Preston; José Afonso Guerra-Assunção; Andrew Warry; Grant A. Hill-Cawthorne; Kim Mallard; Mridul Nair; Anabela Miranda; Adriana Alves; João Perdigão; Miguel Viveiros; Isabel Portugal; Zahra Hasan; Rumina Hasan; Judith R. Glynn; Nigel J. Martin; Arnab Pain; Taane G. Clark

Mycobacterium tuberculosis drug resistance (DR) challenges effective tuberculosis disease control. Current molecular tests examine limited numbers of mutations, and although whole genome sequencing approaches could fully characterise DR, data complexity has restricted their clinical application. A library (1,325 mutations) predictive of DR for 15 anti-tuberculosis drugs was compiled and validated for 11 of them using genomic-phenotypic data from 792 strains. A rapid online ‘TB-Profiler’ tool was developed to report DR and strain-type profiles directly from raw sequences. Using our DR mutation library, in silico diagnostic accuracy was superior to some commercial diagnostics and alternative databases. The library will facilitate sequence-based drug-susceptibility testing.


Nucleic Acids Research | 2001

VIDA: a virus database system for the organization of animal virus genome open reading frames

M. Mar Albà; David A. Lee; Frances M. G. Pearl; Adrian J. Shepherd; Nigel J. Martin; Christine A. Orengo; Paul Kellam

VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/ VIDA.html.


Nucleic Acids Research | 2007

Gene3D: comprehensive structural and functional annotation of genomes

Corin Yeats; Jonathan G. Lees; Adam James Reid; Paul Kellam; Nigel J. Martin; Xinhui Liu; Christine A. Orengo

Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/


Bioinformatics | 2012

SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences

Francesc Coll; Kim Mallard; Mark D. Preston; Stephen D. Bentley; Julian Parkhill; Ruth McNerney; Nigel J. Martin; Taane G. Clark

Summary: Spoligotyping is a well-established genotyping technique based on the presence of unique DNA sequences in Mycobacterium tuberculosis (Mtb), the causal agent of tuberculosis disease (TB). Although advances in sequencing technologies are leading to whole-genome bacterial characterization, tens of thousands of isolates have been spoligotyped, giving a global view of Mtb strain diversity. To bridge the gap, we have developed SpolPred, a software to predict the spoligotype from raw sequence reads. Our approach is compared with experimentally and de novo assembly determined strain types in a set of 44 Mtb isolates. In silico and experimental results are identical for almost all isolates (39/44). However, SpolPred detected five experimentally false spoligotypes and was more accurate and faster than the assembling strategy. Application of SpolPred to an additional seven isolates with no laboratory data led to types that clustered with identical experimental types in a phylogenetic analysis using single-nucleotide polymorphisms. Our results demonstrate the usefulness of the tool and its role in revealing experimental limitations. Availability and implementation: SpolPred is written in C and is available from www.pathogenseq.org/spolpred. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics Online.


Nucleic Acids Research | 2001

A rapid classification protocol for the CATH Domain Database to support structural genomics

Frances M. G. Pearl; Nigel J. Martin; James E. Bray; Daniel W. A. Buchan; Andrew P. Harrison; David A. Lee; Gabrielle A. Reeves; Adrian J. Shepherd; Ian Sillitoe; Annabel E. Todd; Janet M. Thornton; Christine A. Orengo

In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.


Tuberculosis | 2014

PolyTB: A genomic variation map for Mycobacterium tuberculosis

Francesc Coll; Mark D. Preston; José Afonso Guerra-Assunção; Grant Hill-Cawthorn; David Harris; João Perdigão; Miguel Viveiros; Isabel Portugal; Francis Drobniewski; Sebastien Gagneux; Judith R. Glynn; Arnab Pain; Julian Parkhill; Ruth McNerney; Nigel J. Martin; Taane G. Clark

Summary Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is the second major cause of death from an infectious disease worldwide. Recent advances in DNA sequencing are leading to the ability to generate whole genome information in clinical isolates of M. tuberculosis complex (MTBC). The identification of informative genetic variants such as phylogenetic markers and those associated with drug resistance or virulence will help barcode Mtb in the context of epidemiological, diagnostic and clinical studies. Mtb genomic datasets are increasingly available as raw sequences, which are potentially difficult and computer intensive to process, and compare across studies. Here we have processed the raw sequence data (>1500 isolates, eight studies) to compile a catalogue of SNPs (n = 74,039, 63% non-synonymous, 51.1% in more than one isolate, i.e. non-private), small indels (n = 4810) and larger structural variants (n = 800). We have developed the PolyTB web-based tool (http://pathogenseq.lshtm.ac.uk/polytb) to visualise the resulting variation and important meta-data (e.g. in silico inferred strain-types, location) within geographical map and phylogenetic views. This resource will allow researchers to identify polymorphisms within candidate genes of interest, as well as examine the genomic diversity and distribution of strains. PolyTB source code is freely available to researchers wishing to develop similar tools for their pathogen of interest.


data integration in the life sciences | 2006

Data access and integration in the ISPIDER proteomics grid

Lucas Zamboulis; Hao Fan; Khalid Belhajjame; Jennifer A. Siepen; Andrew C. Jones; Nigel J. Martin; Alexandra Poulovassilis; Simon J. Hubbard; Suzanne M. Embury; Norman W. Paton

Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.

Collaboration


Dive into the Nigel J. Martin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul Kellam

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Allan Tucker

Brunel University London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stephen Swift

Brunel University London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge