Zemin Ning
Wellcome Trust Sanger Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zemin Ning.
Nature | 2010
Erin Pleasance; R. Keira Cheetham; Philip Stephens; David J. McBride; Sean Humphray; Christopher Greenman; Ignacio Varela; Meng-Lay Lin; Gonzalo R. Ordóñez; Graham R. Bignell; Kai Ye; Julie A Alipaz; Markus J. Bauer; David Beare; Adam Butler; Richard J. Carter; Lina Chen; Anthony J. Cox; Sarah Edkins; Paula Kokko-Gonzales; Niall Anthony Gormley; Russell Grocock; Christian D. Haudenschild; Matthew M. Hims; Terena James; Mingming Jia; Zoya Kingsbury; Catherine Leroy; John Marshall; Andrew Menzies
All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.
Bioinformatics | 2009
Kai Ye; Marcel H. Schulz; Quan Long; Rolf Apweiler; Zemin Ning
Motivation: There is a strong demand in the genomic community to develop effective algorithms to reliably identify genomic variants. Indel detection using next-gen data is difficult and identification of long structural variations is extremely challenging. Results: We present Pindel, a pattern growth approach, to detect breakpoints of large deletions and medium-sized insertions from paired-end short reads. We use both simulated reads and real data to demonstrate the efficiency of the computer program and accuracy of the results. Availability: The binary code and a short user manual can be freely downloaded from http://www.ebi.ac.uk/∼kye/pindel/. Contact: [email protected]; [email protected]
Nature | 2009
Matthew Berriman; Brian J. Haas; Philip T. LoVerde; R. Alan Wilson; Gary P. Dillon; Gustavo C. Cerqueira; Susan T. Mashiyama; Bissan Al-Lazikani; Luiza F. Andrade; Peter D. Ashton; Martin Aslett; Daniella Castanheira Bartholomeu; Gaëlle Blandin; Conor R. Caffrey; Avril Coghlan; Richard M. R. Coulson; Tim A. Day; Arthur L. Delcher; Ricardo DeMarco; Appoliniare Djikeng; Tina Eyre; John Gamble; Elodie Ghedin; Yong-Hong Gu; Christiane Hertz-Fowler; Hirohisha Hirai; Yuriko Hirai; Robin Houston; Alasdair Ivens; David A. Johnston
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Nature | 2009
Yan Zhou; Huajun Zheng; Yangyi Chen; Lei Zhang; Kai Wang; Jing Guo; Zhen Huang; Bo Zhang; Wei Huang; Ke Jin; Tonghai Dou; Masami Hasegawa; Wang L; Yuan Zhang; Jie Zhou; Lin Tao; Zhiwei Cao; Yixue Li; Tomas Vinar; Brona Brejova; Daniel G. Brown; Ming Li; David J. Miller; David Blair; Yang Zhong; Zhu Chen; Feng Liu; Wei Hu; Zhi-Qin Wang; Qin-Hua Zhang
Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, which is a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signalling pathways for growth, development and maturation. Having a complex nervous system and a well-developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.
Nature | 2012
Aylwyn Scally; Julien Y. Dutheil; LaDeana W. Hillier; Gregory Jordan; Ian Goodhead; Javier Herrero; Asger Hobolth; Tuuli Lappalainen; Thomas Mailund; Tomas Marques-Bonet; Shane McCarthy; Stephen H. Montgomery; Petra C. Schwalie; Y. Amy Tang; Michelle C. Ward; Yali Xue; Bryndis Yngvadottir; Can Alkan; Lars Nørvang Andersen; Qasim Ayub; Edward V. Ball; Kathryn Beal; Brenda J. Bradley; Yuan Chen; Chris Clee; Stephen Fitzgerald; Tina Graves; Yong Gu; Paul Heath; Andreas Heger
Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
Nature Methods | 2009
Iwanka Kozarewa; Zemin Ning; Michael A. Quail; Mandy Sanders; Matthew Berriman; Daniel J. Turner
Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low G+C content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the PCR, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and single nucleotide polymorphism calling and aiding de novo assembly. We illustrate this by generating and analyzing DNA sequences from extremely (G+C)-poor (Plasmodium falciparum), (G+C)-neutral (Escherichia coli) and (G+C)-rich (Bordetella pertussis) genomes.
Powder Technology | 1998
Colin Thornton; Zemin Ning
Abstract The paper considers the normal impact of elastic-perfectly plastic spheres, with and without interface adhesion, and presents an analytical solution for the coefficient of restitution which is expressed in terms of the impact velocity, the critical sticking velocity and the velocity below which the interaction is assumed to be elastic.
Genome Research | 2011
Dent Earl; Keith Bradnam; John St. John; Aaron E. Darling; Dawei Lin; Joseph Fass; Hung On Ken Yu; Vince Buffalo; Daniel R. Zerbino; Mark Diekhans; Ngan Nguyen; Pramila Ariyaratne; Wing-Kin Sung; Zemin Ning; Matthias Haimel; Jared T. Simpson; Nuno A. Fonseca; Inanc Birol; T. Roderick Docking; Isaac Ho; Daniel S. Rokhsar; Rayan Chikhi; Dominique Lavenier; Guillaume Chapuis; Delphine Naquin; Nicolas Maillet; Michael C. Schatz; David R. Kelley; Adam M. Phillippy; Sergey Koren
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
GigaScience | 2013
Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen Chi Chou; Jacques Corbeil; Cristian Del Fabbro; Roderick R. Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho
BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.
Nucleic Acids Research | 2009
Genís Parra; Keith Bradnam; Zemin Ning; Thomas M. Keane; Ian Korf
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.