Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Clive J. Hoggart is active.

Publication


Featured researches published by Clive J. Hoggart.


Nature Genetics | 2009

Genome-wide association analysis of metabolic traits in a birth cohort from a founder population.

Chiara Sabatti; Anna-Liisa Hartikainen; Anneli Pouta; Samuli Ripatti; Jae Brodsky; Christopher Jones; Noah Zaitlen; Teppo Varilo; Marika Kaakinen; Ulla Sovio; Aimo Ruokonen; Jaana Laitinen; Eveliina Jakkula; Lachlan Coin; Clive J. Hoggart; Andrew Collins; Hannu Turunen; Stacey Gabriel; Paul Elliot; Mark I. McCarthy; Mark J. Daly; Marjo-Riitta Järvelin; Nelson B. Freimer; Leena Peltonen

Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene–environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.


PLOS Genetics | 2008

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

Clive J. Hoggart; John C. Whittaker; Maria De Iorio; David J. Balding

Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.


Genetic Epidemiology | 2008

Genome-wide significance for dense SNP and resequencing data

Clive J. Hoggart; Taane G. Clark; Maria De Iorio; John C. Whittaker; David J. Balding

The problem of multiple testing is an important aspect of genome‐wide association studies, and will become more important as marker densities increase. The problem has been tackled with permutation and false discovery rate procedures and with Bayes factors, but each approach faces difficulties that we briefly review. In the current context of multiple studies on different genotyping platforms, we argue for the use of truly genome‐wide significance thresholds, based on all polymorphisms whether or not typed in the study. We approximate genome‐wide significance thresholds in contemporary West African, East Asian and European populations by simulating sequence data, based on all polymorphisms as well as for a range of single nucleotide polymorphism (SNP) selection criteria. Overall we find that significance thresholds vary by a factor of >20 over the SNP selection criteria and statistical tests that we consider and can be highly dependent on sample size. We compare our results for sequence data to those derived by the HapMap Consortium and find notable differences which may be due to the small sample sizes used in the HapMap estimate. Genet. Epidemiol. 32:179–185, 2008.


PLOS ONE | 2012

MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

Paul F. O'Reilly; Clive J. Hoggart; Yotsawat Pomyen; Federico C. F. Calboli; Paul Elliott; Marjo-Riitta Järvelin; Lachlan Coin

The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.


Nature Genetics | 2009

Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels

John Chambers; Weihua Zhang; Yun Li; Joban Sehmi; Mark N. Wass; Delilah Zabaneh; Clive J. Hoggart; Henry K. Bayele; Mark McCarthy; Leena Peltonen; Nelson B. Freimer; Surjit Kaila Srai; Patrick H. Maxwell; Michael J. E. Sternberg; Aimo Ruokonen; Gonçalo R. Abecasis; Marjo-Riitta Järvelin; James Scott; Paul Elliott; Jaspal S. Kooner

We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09–0.17) g/dl lower per copy of allele A (P = 1.6 × 10−13). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.


PLOS ONE | 2009

Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases.

Hariklia Eleftherohorinou; Victoria J. Wright; Clive J. Hoggart; Anna-Liisa Hartikainen; Marjo-Riitta Järvelin; David J. Balding; Lachlan Coin; Michael Levin

Although the introduction of genome-wide association studies (GWAS) have greatly increased the number of genes associated with common diseases, only a small proportion of the predicted genetic contribution has so far been elucidated. Studying the cumulative variation of polymorphisms in multiple genes acting in functional pathways may provide a complementary approach to the more common single SNP association approach in understanding genetic determinants of common disease. We developed a novel pathway-based method to assess the combined contribution of multiple genetic variants acting within canonical biological pathways and applied it to data from 14,000 UK individuals with 7 common diseases. We tested inflammatory pathways for association with Crohns disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) with 4 non-inflammatory diseases as controls. Using a variable selection algorithm, we identified variants responsible for the pathway association and evaluated their use for disease prediction using a 10 fold cross-validation framework in order to calculate out-of-sample area under the Receiver Operating Curve (AUC). The generalisability of these predictive models was tested on an independent birth cohort from Northern Finland. Multiple canonical inflammatory pathways showed highly significant associations (p 10−3–10−20) with CD, T1D and RA. Variable selection identified on average a set of 205 SNPs (149 genes) for T1D, 350 SNPs (189 genes) for RA and 493 SNPs (277 genes) for CD. The pattern of polymorphisms at these SNPS were found to be highly predictive of T1D (91% AUC) and RA (85% AUC), and weakly predictive of CD (60% AUC). The predictive ability of the T1D model (without any parameter refitting) had good predictive ability (79% AUC) in the Finnish cohort. Our analysis suggests that genetic contribution to common inflammatory diseases operates through multiple genes interacting in functional pathways.


Hypertension | 2012

Genomewide association study using a high-density single nucleotide polymorphism array and case-control design identifies a novel essential hypertension susceptibility locus in the promoter region of endothelial NO synthase

Erika Salvi; Zoltán Kutalik; Nicola Glorioso; Paola Benaglio; Francesca Frau; Tatiana Kuznetsova; Hisatomi Arima; Clive J. Hoggart; Jean Tichet; Yury P. Nikitin; Costanza Conti; Jitka Seidlerová; Valérie Tikhonoff; Katarzyna Stolarz-Skrzypek; Toby Johnson; Nabila Devos; Laura Zagato; Simonetta Guarrera; Roberta Zaninello; Andrea Calabria; Benedetta Stancanelli; Chiara Troffa; Lutgarde Thijs; Federica Rizzi; Galina Simonova; Sara Lupoli; Giuseppe Argiolas; Daniele Braga; Maria C. D'Alessio; Maria Francesca Ortu

Essential hypertension is a multifactorial disorder and is the main risk factor for renal and cardiovascular complications. The research on the genetics of hypertension has been frustrated by the small predictive value of the discovered genetic variants. The HYPERGENES Project investigated associations between genetic variants and essential hypertension pursuing a 2-stage study by recruiting cases and controls from extensively characterized cohorts recruited over many years in different European regions. The discovery phase consisted of 1865 cases and 1750 controls genotyped with 1M Illumina array. Best hits were followed up in a validation panel of 1385 cases and 1246 controls that were genotyped with a custom array of 14 055 markers. We identified a new hypertension susceptibility locus (rs3918226) in the promoter region of the endothelial NO synthase gene (odds ratio: 1.54 [95% CI: 1.37–1.73]; combined P=2.58 · 10−13). A meta-analysis, using other in silico/de novo genotyping data for a total of 21 714 subjects, resulted in an overall odds ratio of 1.34 (95% CI: 1.25–1.44; P=1.032 · 10−14). The quantitative analysis on a population-based sample revealed an effect size of 1.91 (95% CI: 0.16–3.66) for systolic and 1.40 (95% CI: 0.25–2.55) for diastolic blood pressure. We identified in silico a potential binding site for ETS transcription factors directly next to rs3918226, suggesting a potential modulation of endothelial NO synthase expression. Biological evidence links endothelial NO synthase with hypertension, because it is a critical mediator of cardiovascular homeostasis and blood pressure control via vascular tone regulation. This finding supports the hypothesis that there may be a causal genetic variation at this locus.


PLOS Genetics | 2009

Genetic Determinants of Height Growth Assessed Longitudinally from Infancy to Adulthood in the Northern Finland Birth Cohort 1966

Ulla Sovio; Amanda J. Bennett; Iona Y. Millwood; John Molitor; Paul F. O'Reilly; Nicholas J. Timpson; Marika Kaakinen; Jaana Laitinen; Jari Haukka; Demetris Pillas; Ioanna Tzoulaki; Jassy Molitor; Clive J. Hoggart; Lachlan Coin; Anneli Pouta; Anna-Liisa Hartikainen; Nelson B. Freimer; Elisabeth Widen; Leena Peltonen; Paul Elliott; Mark McCarthy; Marjo-Riitta Järvelin

Recent genome-wide association (GWA) studies have identified dozens of common variants associated with adult height. However, it is unknown how these variants influence height growth during childhood. We derived peak height velocity in infancy (PHV1) and puberty (PHV2) and timing of pubertal height growth spurt from parametric growth curves fitted to longitudinal height growth data to test their association with known height variants. The study consisted of N = 3,538 singletons from the prospective Northern Finland Birth Cohort 1966 with genotype data and frequent height measurements (on average 20 measurements per person) from 0–20 years. Twenty-six of the 48 variants tested associated with adult height (p<0.05, adjusted for sex and principal components) in this sample, all in the same direction as in previous GWA scans. Seven SNPs in or near the genes HHIP, DLEU7, UQCC, SF3B4/SV2A, LCORL, and HIST1H1D associated with PHV1 and five SNPs in or near SOCS2, SF3B4/SV2A, C17orf67, CABLES1, and DOT1L with PHV2 (p<0.05). We formally tested variants for interaction with age (infancy versus puberty) and found biologically meaningful evidence for an age-dependent effect for the SNP in SOCS2 (p = 0.0030) and for the SNP in HHIP (p = 0.045). We did not have similar prior evidence for the association between height variants and timing of pubertal height growth spurt as we had for PHVs, and none of the associations were statistically significant after correction for multiple testing. The fact that in this sample, less than half of the variants associated with adult height had a measurable effect on PHV1 or PHV2 is likely to reflect limited power to detect these associations in this dataset. Our study is the first genetic association analysis on longitudinal height growth in a prospective cohort from birth to adulthood and gives grounding for future research on the genetic regulation of human height during different periods of growth.


Genetics | 2007

Sequence-Level Population Simulations Over Large Genomic Regions

Clive J. Hoggart; Marc Chadeau-Hyam; Taane G. Clark; Riccardo Lampariello; John C. Whittaker; Maria De Iorio; David J. Balding

Simulation is an invaluable tool for investigating the effects of various population genetics modeling assumptions on resulting patterns of genetic diversity, and for assessing the performance of statistical techniques, for example those designed to detect and measure the genomic effects of selection. It is also used to investigate the effectiveness of various design options for genetic association studies. Backward-in-time simulation methods are computationally efficient and have become widely used since their introduction in the 1980s. The forward-in-time approach has substantial advantages in terms of accuracy and modeling flexibility, but at greater computational cost. We have developed flexible and efficient simulation software and a rescaling technique to aid computational efficiency that together allow the simulation of sequence-level data over large genomic regions in entire diploid populations under various scenarios for demography, mutation, selection, and recombination, the latter including hotspots and gene conversion. Our forward evolution of genomic regions (FREGENE) software is freely available from www.ebi.ac.uk/projects/BARGEN together with an ancillary program to generate phenotype labels, either binary or quantitative. In this article we discuss limitations of coalescent-based simulation, introduce the rescaling technique that makes large-scale forward-in-time simulation feasible, and demonstrate the utility of various features of FREGENE, many not previously available.


BMC Bioinformatics | 2008

Fregene: simulation of realistic sequence-level data in populations and ascertained samples.

Marc Chadeau-Hyam; Clive J. Hoggart; Paul F. O'Reilly; John C. Whittaker; Maria De Iorio; David J. Balding

BackgroundFREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets.ResultsWe report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection.ConclusionFREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.

Collaboration


Dive into the Clive J. Hoggart's collaboration.

Top Co-Authors

Avatar

Lachlan Coin

University of Queensland

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Maria De Iorio

University College London

View shared research outputs
Researchain Logo
Decentralizing Knowledge