William Astle
University of Cambridge
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William Astle.
Science | 2014
Lu Chen; Myrto Kostadima; Joost H.A. Martens; Giovanni Canu; Sara P. Garcia; Ernest Turro; Kate Downes; Iain C. Macaulay; Ewa Bielczyk-Maczyńska; Sophia Coe; Samantha Farrow; Pawan Poudel; Frances Burden; Sjoert B. G. Jansen; William Astle; Antony P. Attwood; Tadbir K. Bariana; Bernard de Bono; Alessandra Breschi; John Chambers; Fizzah Choudry; Laura Clarke; Paul Coupland; Martijn van der Ent; Wendy N. Erber; Joop H. Jansen; Rémi Favier; Matthew Fenech; Nicola S. Foad; Kathleen Freson
Introduction Blood production in humans culminates in the daily release of around 1011 cells into the circulation, mainly platelets and red blood cells. All blood cells originate from a minute population of hematopoietic stem cells (HSCs) that expands and differentiates into progenitor cells with increasingly restricted lineage choice. Characterizing alternative splicing events involved in hematopoiesis is critical for interpreting the effects of mutations leading to inherited disorders and blood cancers and for the rational design of strategies to advance transplantation and regenerative medicine. Overview of methodology. RNA-sequencing reads from human blood progenitors [opaque cells in (A)] were mapped to the transcriptome to quantify gene and transcript expression. Reads were also mapped to the genome to identify novel splice junctions and characterize alternative splicing events (B). Rationale To address this, we explored the transcriptional diversity of human blood progenitors by sequencing RNA from six progenitor and two precursor populations representing the classical myeloid commitment stages of hematopoiesis and the main lymphoid stage. Data were aligned to the human reference transcriptome and genome to quantify known transcript isoforms and to identify novel splicing events, respectively. We used Bayesian polytomous model selection to classify transcripts into distinct expression patterns across the three cell types that comprise each differentiation step. Results We identified extensive transcriptional changes involving 6711 genes and 10,724 transcripts and validated a number of these. Many of the changes at the transcript isoform level did not result in significant changes at the gene expression level. Moreover, we identified transcripts unique to each of the progenitor populations, observing enrichment in non–protein-coding elements at the early stages of differentiation. We discovered 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes and often resulting in the gain or loss of functional domains. Of the alternative splice sites displaying differential usage, 73% resulted in exon-skipping events involving at least one protein domain (38.5%) or introducing a premature stop codon (26%). Enrichment analysis of RNA-binding motifs provided insights into the regulation of cell type–specific splicing events. To demonstrate the importance of specific isoforms in driving lineage fating events, we investigated the role of a transcription factor highlighted by our analyses. Our data show that nuclear factor I/B (NFIB) is highly expressed in megakaryocytes and that it is transcribed from an unannotated transcription start site preceding a novel exon. The novel NFIB isoform lacks the DNA binding/dimerization domain and therefore is unable to interact with its binding partner, NFIC. We further show that NFIB and NFIC are important in megakaryocyte differentiation. Conclusion We produced a quantitative catalog of transcriptional changes and splicing events representing the early progenitors of human blood. Our analyses unveil a previously undetected layer of regulation affecting cell fating, which involves transcriptional isoforms switching without noticeable changes at the gene level and resulting in the gain or loss of protein functions. A BLUEPRINT of immune cell development To determine the epigenetic mechanisms that direct blood cells to develop into the many components of our immune system, the BLUEPRINT consortium examined the regulation of DNA and RNA transcription to dissect the molecular traits that govern blood cell differentiation. By inducing immune responses, Saeed et al. document the epigenetic changes in the genome that underlie immune cell differentiation. Cheng et al. demonstrate that trained monocytes are highly dependent on the breakdown of sugars in the presence of oxygen, which allows cells to produce the energy needed to mount an immune response. Chen et al. examine RNA transcripts and find that specific cell lineages use RNA transcripts of different length and composition (isoforms) to form proteins. Together, the studies reveal how epigenetic effects can drive the development of blood cells involved in the immune system. Science, this issue 10.1126/science.1251086, 10.1126/science.1250684, 10.1126/science.1251033 RNA sequencing identifies how different cell fate decisions are made during blood cell differentiation. Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type–specific expression changes: 6711 genes and 10,724 transcripts, enriched in non–protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation—the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine.
Cell | 2016
William Astle; Heather Elding; Tao Jiang; Dave Allen; Dace Ruklisa; Alice L. Mann; Daniel Mead; Heleen Bouman; Fernando Riveros-Mckay; Myrto Kostadima; John J. Lambourne; Suthesh Sivapalaratnam; Kate Downes; Kousik Kundu; Lorenzo Bomba; Kim Berentsen; John R. Bradley; Louise C. Daugherty; Olivier Delaneau; Kathleen Freson; Stephen F. Garner; Luigi Grassi; Jose A. Guerrero; Matthias Haimel; Eva M. Janssen-Megens; Anita M. Kaan; Mihir Anant Kamat; Bowon Kim; Amit Mandoli; Jonathan Marchini
Summary Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.
Bioinformatics | 2012
Jie Hao; William Astle; Maria De Iorio; Timothy M. D. Ebbels
MOTIVATION Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain metabolite profiles in complex biological mixtures. Common methods used to assign and estimate concentrations of metabolites involve either an expert manual peak fitting or extra pre-processing steps, such as peak alignment and binning. Peak fitting is very time consuming and is subject to human error. Conversely, alignment and binning can introduce artefacts and limit immediate biological interpretation of models. RESULTS We present the Bayesian automated metabolite analyser for NMR spectra (BATMAN), an R package that deconvolutes peaks from one-dimensional NMR spectra, automatically assigns them to specific metabolites from a target list and obtains concentration estimates. The Bayesian model incorporates information on characteristic peak patterns of metabolites and is able to account for shifts in the position of peaks commonly seen in NMR spectra of biological samples. It applies a Markov chain Monte Carlo algorithm to sample from a joint posterior distribution of the model parameters and obtains concentration estimates with reduced error compared with conventional numerical integration and comparable to manual deconvolution by experienced spectroscopists. AVAILABILITY AND IMPLEMENTATION http://www1.imperial.ac.uk/medicine/people/t.ebbels/ CONTACT [email protected].
Nature Protocols | 2014
Jie Hao; Manuel Liebeke; William Astle; Maria De Iorio; Jacob G. Bundy; Timothy M. D. Ebbels
Data processing for 1D NMR spectra is a key bottleneck for metabolomic and other complex-mixture studies, particularly where quantitative data on individual metabolites are required. We present a protocol for automated metabolite deconvolution and quantification from complex NMR spectra by using the Bayesian automated metabolite analyzer for NMR (BATMAN) R package. BATMAN models resonances on the basis of a user-controllable set of templates, each of which specifies the chemical shifts, J-couplings and relative peak intensities for a single metabolite. Peaks are allowed to shift position slightly between spectra, and peak widths are allowed to vary by user-specified amounts. NMR signals not captured by the templates are modeled non-parametrically by using wavelets. The protocol covers setting up user template libraries, optimizing algorithmic input parameters, improving prior information on peak positions, quality control and evaluation of outputs. The outputs include relative concentration estimates for named metabolites together with associated Bayesian uncertainty estimates, as well as the fit of the remainder of the spectrum using wavelets. Graphical diagnostics allow the user to examine the quality of the fit for multiple spectra simultaneously. This approach offers a workflow to analyze large numbers of spectra and is expected to be useful in a wide range of metabolomics studies.
Blood | 2016
Simon Stritt; Paquita Nurden; Ernest Turro; Daniel Greene; Sjoert B. G. Jansen; Sarah K. Westbury; Romina Petersen; William Astle; Sandrine Marlin; Tadbir K. Bariana; Myrto Kostadima; Claire Lentaigne; Stephanie Maiwald; Sofia Papadia; Anne M. Kelly; Jonathan Stephens; Christopher J. Penkett; Sofie Ashford; Salih Tuna; Steve Austin; Tamam Bakchoul; Peter William Collins; Rémi Favier; Michele P. Lambert; Mary Mathias; Carolyn M. Millar; Rutendo Mapeta; David J. Perry; Sol Schulman; Ilenia Simeoni
Macrothrombocytopenia (MTP) is a heterogeneous group of disorders characterized by enlarged and reduced numbers of circulating platelets, sometimes resulting in abnormal bleeding. In most MTP, this phenotype arises because of altered regulation of platelet formation from megakaryocytes (MKs). We report the identification of DIAPH1, which encodes the Rho-effector diaphanous-related formin 1 (DIAPH1), as a candidate gene for MTP using exome sequencing, ontological phenotyping, and similarity regression. We describe 2 unrelated pedigrees with MTP and sensorineural hearing loss that segregate with a DIAPH1 R1213* variant predicting partial truncation of the DIAPH1 diaphanous autoregulatory domain. The R1213* variant was linked to reduced proplatelet formation from cultured MKs, cell clustering, and abnormal cortical filamentous actin. Similarly, in platelets, there was increased filamentous actin and stable microtubules, indicating constitutive activation of DIAPH1. Overexpression of DIAPH1 R1213* in cells reproduced the cytoskeletal alterations found in platelets. Our description of a novel disorder of platelet formation and hearing loss extends the repertoire of DIAPH1-related disease and provides new insight into the autoregulation of DIAPH1 activity.
BMC Cardiovascular Disorders | 2013
Kenan Direk; Marina Cecelja; William Astle; Phil Chowienczyk; Tim D. Spector; Mario Falchi; Toby Andrew
BackgroundExcess accumulation of visceral fat is a prominent risk factor for cardiovascular and metabolic morbidity. While computed tomography (CT) is the gold standard to measure visceral adiposity, this is often not possible for large studies - thus valid, but less expensive and intrusive proxy measures of visceral fat are required such as dual-energy X-ray absorptiometry (DXA). Study aims were to a) identify a valid DXA-based measure of visceral adipose tissue (VAT), b) estimate VAT heritability and c) assess visceral fat association with morbidity in relation to body fat distribution.MethodsA validation sample of 54 females measured for detailed body fat composition - assessed using CT, DXA and anthropometry – was used to evaluate previously published predictive models of CT-measured visceral fat. Based upon a validated model, we realised an out-of-sample estimate of abdominal VAT area for a study sample of 3457 female volunteer twins and estimated VAT area heritability using a classical twin study design. Regression and residuals analyses were used to assess the relationship between adiposity and morbidity.ResultsPublished models applied to the validation sample explained >80% of the variance in CT-measured visceral fat. While CT visceral fat was best estimated using a linear regression for waist circumference, CT body cavity area and total abdominal fat (R2 = 0.91), anthropometric measures alone predicted VAT almost equally well (CT body cavity area and waist circumference, R2 = 0.86). Narrow sense VAT area heritability for the study sample was estimated to be 58% (95% CI: 51-66%) with a shared familial component of 24% (17-30%). VAT area is strongly associated with type 2 diabetes (T2D), hypertension (HT), subclinical atherosclerosis and liver function tests. In particular, VAT area is associated with T2D, HT and liver function (alanine transaminase) independent of DXA total abdominal fat and body mass index (BMI).ConclusionsDXA and anthropometric measures can be utilised to derive estimates of visceral fat as a reliable alternative to CT. Visceral fat is heritable and appears to mediate the association between body adiposity and morbidity. This observation is consistent with hypotheses that suggest excess visceral adiposity is causally related to cardiovascular and metabolic disease.
Bioinformatics | 2014
Ernest Turro; William Astle; Simon Tavaré
MOTIVATION Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups. RESULTS In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution. AVAILABILITY AND IMPLEMENTATION We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/eturro/mmseq.
Journal of the American Statistical Association | 2012
William Astle; Maria De Iorio; Sylvia Richardson; David A. Stephens; Timothy M. D. Ebbels
Nuclear magnetic resonance (NMR) spectra are widely used in metabolomics to obtain profiles of metabolites dissolved in biofluids such as cell supernatants. Methods for estimating metabolite concentrations from these spectra are presently confined to manual peak fitting and to binning procedures for integrating resonance peaks. Extensive information on the patterns of spectral resonance generated by human metabolites is now available in online databases. By incorporating this information into a Bayesian model, we can deconvolve resonance peaks from a spectrum and obtain explicit concentration estimates for the corresponding metabolites. Spectral resonances that cannot be deconvolved in this way may also be of scientific interest; so, we model them jointly using wavelets. We describe a Markov chain Monte Carlo algorithm that allows us to sample from the joint posterior distribution of the model parameters, using specifically designed block updates to improve mixing. The strong prior on resonance patterns allows the algorithm to identify peaks corresponding to particular metabolites automatically, eliminating the need for manual peak assignment. We assess our method for peak alignment and concentration estimation. Except in cases when the target resonance signal is very weak, alignment is unbiased and precise. We compare the Bayesian concentration estimates with those obtained from a conventional numerical integration method and find that our point estimates have six-fold lower mean squared error. Finally, we apply our method to a spectral dataset taken from an investigation of the metabolic response of yeast to recombinant protein expression. We estimate the concentrations of 26 metabolites and compare with manual quantification by five expert spectroscopists. We discuss the reason for discrepancies and the robustness of our methods concentration estimates. This article has supplementary materials online.
Nature Genetics | 2016
Valentina Iotchkova; Jie Huang; John A. Morris; Deepti Jain; Caterina Barbieri; Klaudia Walter; Josine L. Min; Lu Chen; William Astle; Massimilian Cocca; Patrick Deelen; Heather Elding; Aliki-Eleni Farmaki; Christopher S. Franklin; Tom R. Gaunt; Albert Hofman; Tao Jiang; Marcus E. Kleber; Genevieve Lachance; Jian'an Luan; Giovanni Malerba; Angela Matchan; Daniel Mead; Yasin Memari; Ioanna Ntalla; Kalliope Panoutsopoulou; Raha Pazoki; John Perry; Fernando Rivadeneira; Maria Sabater-Lleal
Large-scale whole-genome sequence data sets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole-genome sequence data from the UK10K and 1000 Genomes Project into 35,981 study participants of European ancestry, followed by association analysis with 20 quantitative cardiometabolic and hematological traits. We describe 17 new associations, including 6 rare (minor allele frequency (MAF) < 1%) or low-frequency (1% < MAF < 5%) variants with platelet count (PLT), red blood cell indices (MCH and MCV) and HDL cholesterol. Applying fine-mapping analysis to 233 known and new loci associated with the 20 traits, we resolve the associations of 59 loci to credible sets of 20 or fewer variants and describe trait enrichments within regions of predicted regulatory function. These findings improve understanding of the allelic architecture of risk factors for cardiometabolic and hematological diseases and provide additional functional insights with the identification of potentially novel biological targets.
Journal of Clinical Investigation | 2017
Irina Pleines; Joanne Woods; Stephane Chappaz; Verity Kew; Nicola S. Foad; José Ballester-Beltrán; Katja Aurbach; Chiara Lincetto; Rachael M. Lane; Galina Schevzov; Warren S. Alexander; Douglas J. Hilton; William Astle; Kate Downes; Paquita Nurden; Sarah K. Westbury; Andrew D Mumford; Samya Obaji; Peter William Collins; Nihr BioResource; Fabien Delerue; Lars M. Ittner; Nicole S. Bryce; Mira Holliday; Christine A. Lucas; Edna C. Hardeman; Willem H. Ouwehand; Peter Gunning; Ernest Turro; Marloes R. Tijssen
Platelets are anuclear cells that are essential for blood clotting. They are produced by large polyploid precursor cells called megakaryocytes. Previous genome-wide association studies in nearly 70,000 individuals indicated that single nucleotide variants (SNVs) in the gene encoding the actin cytoskeletal regulator tropomyosin 4 (TPM4) exert an effect on the count and volume of platelets. Platelet number and volume are independent risk factors for heart attack and stroke. Here, we have identified 2 unrelated families in the BRIDGE Bleeding and Platelet Disorders (BPD) collection who carry a TPM4 variant that causes truncation of the TPM4 protein and segregates with macrothrombocytopenia, a disorder characterized by low platelet count. N-Ethyl-N-nitrosourea–induced (ENU-induced) missense mutations in Tpm4 or targeted inactivation of the Tpm4 locus led to gene dosage–dependent macrothrombocytopenia in mice. All other blood cell counts in Tpm4-deficient mice were normal. Insufficient TPM4 expression in human and mouse megakaryocytes resulted in a defect in the terminal stages of platelet production and had a mild effect on platelet function. Together, our findings demonstrate a nonredundant role for TPM4 in platelet biogenesis in humans and mice and reveal that truncating variants in TPM4 cause a previously undescribed dominant Mendelian platelet disorder.