Douglas Chesters
Chinese Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Douglas Chesters.
PLOS ONE | 2012
Qing-Yan Dai; Qiang Gao; Chun-Sheng Wu; Douglas Chesters; Chao-Dong Zhu; Aibing Zhang
Unlike distinct species, closely related species offer a great challenge for phylogeny reconstruction and species identification with DNA barcoding due to their often overlapping genetic variation. We tested a sibling species group of pine moth pests in China with a standard cytochrome c oxidase subunit I (COI) gene and two alternative internal transcribed spacer (ITS) genes (ITS1 and ITS2). Five different phylogenetic/DNA barcoding analysis methods (Maximum likelihood (ML)/Neighbor-joining (NJ), “best close match” (BCM), Minimum distance (MD), and BP-based method (BP)), representing commonly used methodology (tree-based and non-tree based) in the field, were applied to both single-gene and multiple-gene analyses. Our results demonstrated clear reciprocal species monophyly for three relatively distant related species, Dendrolimus superans, D. houi, D. kikuchii, as recovered by both single and multiple genes while the phylogenetic relationship of three closely related species, D. punctatus, D. tabulaeformis, D. spectabilis, could not be resolved with the traditional tree-building methods. Additionally, we find the standard COI barcode outperforms two nuclear ITS genes, whatever the methods used. On average, the COI barcode achieved a success rate of 94.10–97.40%, while ITS1 and ITS2 obtained a success rate of 64.70–81.60%, indicating ITS genes are less suitable for species identification in this case. We propose the use of an overall success rate of species identification that takes both sequencing success and assignation success into account, since species identification success rates with multiple-gene barcoding system were generally overestimated, especially by tree-based methods, where only successfully sequenced DNA sequences were used to construct a phylogenetic tree. Non-tree based methods, such as MD, BCM, and BP approaches, presented advantages over tree-based methods by reporting the overall success rates with statistical significance. In addition, our results indicate that the most closely related species D. punctatus, D. tabulaeformis, and D. spectabilis, may be still in the process of incomplete lineage sorting, with occasional hybridizations occurring among them.
PLOS ONE | 2013
Zhe Zhao; Tian-Juan Su; Douglas Chesters; Shi-di Wang; Simon Y. W. Ho; Chao-Dong Zhu; Xiao lin Chen; Chun-tian Zhang
Tachinid flies are natural enemies of many lepidopteran and coleopteran pests of forests, crops, and fruit trees. In order to address the lack of genetic data in this economically important group, we sequenced the complete mitochondrial genome of the Palaearctic tachinid fly Elodia flavipalpis Aldrich, 1933. Usually found in Northern China and Japan, this species is one of the primary natural enemies of the leaf-roller moths (Tortricidae), which are major pests of various fruit trees. The 14,932-bp mitochondrial genome was typical of Diptera, with 13 protein-coding genes, 22 tRNA genes, and 2 rRNA genes. However, its control region is only 105 bp in length, which is the shortest found so far in flies. In order to estimate dipteran evolutionary relationships, we conducted a phylogenetic analysis of 58 mitochondrial genomes from 23 families. Maximum-likelihood and Bayesian methods supported the monophyly of both Tachinidae and superfamily Oestroidea. Within the subsection Calyptratae, Muscidae was inferred as the sister group to Oestroidea. Within Oestroidea, Calliphoridae and Sarcophagidae formed a sister clade to Oestridae and Tachinidae. Using a Bayesian relaxed clock calibrated with fossil data, we estimated that Tachinidae originated in the middle Eocene.
PLOS ONE | 2012
Douglas Chesters; Ying Wang; Fang Yu; Ming Bai; Tong-Xin Zhang; Hao-Yuan Hu; Chao Dong Zhu; Cheng-De Li; Yan-Zhou Zhang
Integrated taxonomy uses evidence from a number of different character types to delimit species and other natural groupings. While this approach has been advocated recently, and should be of particular utility in the case of diminutive insect parasitoids, there are relatively few examples of its application in these taxa. Here, we use an integrated framework to delimit independent lineages in Encyrtus sasakii (Hymenoptera: Chalcidoidea: Encyrtidae), a parasitoid morphospecies previously considered a host generalist. Sequence variation at the DNA barcode (cytochrome c oxidase I, COI) and nuclear 28S rDNA loci were compared to morphometric recordings and mating compatibility tests, among samples of this species complex collected from its four scale insect hosts, covering a broad geographic range of northern and central China. Our results reveal that Encyrtus sasakii comprises three lineages that, while sharing a similar morphology, are highly divergent at the molecular level. At the barcode locus, the median K2P molecular distance between individuals from three primary populations was found to be 11.3%, well outside the divergence usually observed between Chalcidoidea conspecifics (0.5%). Corroborative evidence that the genetic lineages represent independent species was found from mating tests, where compatibility was observed only within populations, and morphometric analysis, which found that despite apparent morphological homogeneity, populations clustered according to forewing shape. The independent lineages defined by the integrated analysis correspond to the three scale insect hosts, suggesting the presence of host specific cryptic species. The finding of hidden host specificity in this species complex demonstrates the critical role that DNA barcoding will increasingly play in revealing hidden biodiversity in taxa that present difficulties for traditional taxonomic approaches.
PLOS ONE | 2012
Liang Lu; Douglas Chesters; Wen Zhang; Guichang Li; Ying Ma; Huailei Ma; Xiuping Song; Haixia Wu; Fengxia Meng; Chao-Dong Zhu; Qiyong Liu
Although mammals are a well-studied group of animals, making accurate field identification of small mammals is still complex because of morphological variation across developmental stages, color variation of pelages, and often damaged osteological and dental characteristics. In 2008, small mammals were collected for an epidemiological study of a spotted fever outbreak in Hainan, China. Ten species of small mammals were identified by morphological characters in the field, most using pelage color characters only. The study is extended here, in order to assess whether DNA barcoding would be suitable as an identification tool in these small mammals. Barcode clusters showed some incongruence with morphospecies, especially for some species of Rattus and Niviventer, so molecular delineation was carried out with an expanded dataset of combined cytochrome b (Cyt-b) and cytochrome c oxidase subunit I (COI) sequences. COI sequences were successfully amplified from 83% of collected mammals, but failed in all specimens of Suncus murinus, which were thus excluded in DNA barcoding analysis. Of note, ten molecular taxonomic units were found from samples of nine morphologically identified species. Accordingly, 11 species of small mammals were present in the investigated areas, including four Rattus species, three Niviventer species, Callosciurus erythraeus, Neohylomys hainanensis, Tupaia belangeri, and Suncus murinus. Based on the results of the phylogenetic and molecular delineation analyses, the systematic status of some rodent species should be redefined. R. rattus hainanicus and R. rattus sladeni are synonyms of R. andamanensis. R. losea from China and Southeast Asia comprises two independent species: R. losea and R. sakeratensis. Finally, the taxonomic status of three putative species of Niviventer should be further confirmed according to morphological, molecular and ecological characters.
Zoologica Scripta | 2015
Liang Lu; Deyan Ge; Douglas Chesters; Simon Y. W. Ho; Ying Ma; Guichang Li; Zhixin Wen; Yongjie Wu; Jun Wang; Lin Xia; Jingli Liu; Tianyu Guo; Xiaolong Zhang; Chao-Dong Zhu; Qisen Yang; Qiyong Liu
The white‐bellied rat, Niviventer, is a genus endemic to Southeast Asia and China. However, the interspecific phylogenetic relationships and species diversity of this genus remain poorly understood. In the present study, single and multi‐locus analyses were performed. Phylogenetic reconstruction on Cytochrome b (512 individuals, including data from Genbank) revealed five major clades with approximately 35 operational taxonomic units (OTUs), a number twice the existing taxonomy. The first clade (N. langbianis species group) was the earliest diverged. The second clade (N. fulvescens species group) diverged in Southeast Asia, the south and lower altitude regions of the Hengduan Mountains, and Southeast China. The third clade (the N. eha species group) is endemic to high altitudes in Northwest Yunnan and the central region of Himalaya. The fourth clade (the N. andersoni species group), is mainly confined to alpine regions of the Hengduan Mountains. The fifth clade (N. confucianus species group) is mainly distributed in the north and higher altitude regions of eastern Himalaya, the Hengduan Mountains and Taiwan, with the complex also invading central and northern China. Results from the combined dataset of four genes (Cytochrome b, Cytochrome oxidase subunit I, the D‐loop sequence of the mitochondrial genome and the first exon of the nuclear interphotoreceptor retinoid binding protein) for 82 representative individuals from China generally coincide with the result of the single gene, with 12 OTUs identified. These results provide a preliminary framework for the existing classification of this highly diversified genus. The divergence time of Niviventer based on the four gene topology was dated to the late Miocene ~6.41 Ma. Significant differences were detected in the general body form changes among these units based on voucher specimens. Moreover, geometric morphometric analysis of the cranium shape of voucher specimens indicated significant differences among five major species groups. Shape divergence of the cranium among several OTUs within the N. confucinaus complex is also significant. Our results provide further evidence for rapid and highly underestimated diversification of Niviventer both in genetics and morphology.
Molecular Ecology Resources | 2015
Anna Papadopoulou; Douglas Chesters; Indiana Coronado; Gissela de la Cadena; Anabela Cardoso; Jazmina C. Reyes; Jean-Michel Maes; Ricardo Rueda; Jesús Gómez-Zurita
Rapid degradation of tropical forests urges to improve our efficiency in large‐scale biodiversity assessment. DNA barcoding can assist greatly in this task, but commonly used phenetic approaches for DNA‐based identifications rely on the existence of comprehensive reference databases, which are infeasible for hyperdiverse tropical ecosystems. Alternatively, phylogenetic methods are more robust to sparse taxon sampling but time‐consuming, while multiple alignment of species‐diagnostic, typically length‐variable, markers can be problematic across divergent taxa. We advocate the combination of phylogenetic and phenetic methods for taxonomic assignment of DNA‐barcode sequences against incomplete reference databases such as GenBank, and we developed a pipeline to implement this approach on large‐scale plant diversity projects. The pipeline workflow includes several steps: database construction and curation, query sequence clustering, sequence retrieval, distance calculation, multiple alignment and phylogenetic inference. We describe the strategies used to establish these steps and the optimization of parameters to fit the selected psbA‐trnH marker. We tested the pipeline using infertile plant samples and herbivore diet sequences from the highly threatened Nicaraguan seasonally dry forest and exploiting a valuable purpose‐built resource: a partial local reference database of plant psbA‐trnH. The selected methodology proved efficient and reliable for high‐throughput taxonomic assignment, and our results corroborate the advantage of applying ‘strict’ tree‐based criteria to avoid false positives. The pipeline tools are distributed as the scripts suite ‘BAGpipe’ (pipeline for Biodiversity Assessment using GenBank data), which can be readily adjusted to the purposes of other projects and applied to sequence‐based identification for any marker or taxon.
Methods in Ecology and Evolution | 2015
Douglas Chesters; Weimin Zheng; Chao-Dong Zhu
Summary 1. A number of systems have been developed for taxonomic identification of DNA sequence data. However, in eukaryotes, these systems are largely based on single predefined genes, and thus are vulnerable to biases from limited character sampling, and are not able to identify most sequences of genomic origin. 2. We here demonstrate an implementation for multigene DNA barcoding. First, a reference framework is built of frequently sequenced loci. Query sequence data are then organized by excising sequences homologous to references and assigning species names where the level of sequence similarity between query and reference falls within the (gene-appropriate) level of intraspecific variation usually observed. The approach is compared to some existing methods including ‘bagpipe_phylo’, a re-implementation for taxonomic assignment on phylogenies. 3. Seventy-eight per cent of the species and 94% of the genera known to be present in arthropod test queries were correctly inferred by the proposed multigene system. Most critically, the rate of species identification was increased over using a COI-only approach. Twenty-four per cent of species in the queries were found only in non-COI genes, with no clear reduction in the accuracy of species assignment at many of these other loci. Similarly, additional species assignments were made for a pooled metagenomic data set using non-COI columns. On a smaller query data set of 273 bee sequences, the accuracy of species assignment using modified calculation of distances was indistinguishable from phylogeny-based taxonomic identification. 4. Standardized single fragment DNA barcoding remains an invaluable tool in species identification for PCRgenerated sequence data. The approach developed here supplements the established species-dense DNA barcode backbone with other genomic data, reducing error via integration of independent genetic loci and permitting additional identifications for non-barcode fragments. The latter will be particularly relevant in monitoring of community genomics using next-generation sequencing platforms.
PLOS ONE | 2013
Cheng Ling; Tsuyoshi Hamada; Jianing Bai; Xianbin Li; Douglas Chesters; Weimin Zheng; Weifeng Shi
MrBayes is model-based phylogenetic inference tool using Bayesian statistics. However, model-based assessment of phylogenetic trees adds to the computational burden of tree-searching, and so poses significant computational challenges. Graphics Processing Units (GPUs) have been proposed as high performance, low cost acceleration platforms and several parallelized versions of the Metropolis Coupled Markov Chain Mote Carlo (MC3) algorithm in MrBayes have been presented that can run on GPUs. However, some bottlenecks decrease the efficiency of these implementations. To address these bottlenecks, we propose a tight GPU MC3 (tgMC3) algorithm. tgMC3 implements a different architecture from the one-to-one acceleration architecture employed in previously proposed methods. It merges multiply discrete GPU kernels according to the data dependency and hence decreases the number of kernels launched and the complexity of data transfer. We implemented tgMC3 and made performance comparisons with an earlier proposed algorithm, nMC3, and also with MrBayes MC3 under serial and multiply concurrent CPU processes. All of the methods were benchmarked on the same computing node from DEGIMA. Experiments indicate that the tgMC3 method outstrips nMC3 (v1.0) with speedup factors from 2.1 to 2.7×. In addition, tgMC3 outperforms the serial MrBayes MC3 by a factor of 6 to 30× when using a single GTX480 card, whereas a speedup factor of around 51× can be achieved by using two GTX 480 cards on relatively long sequences. Moreover, tgMC3 was compared with MrBayes accelerated by BEAGLE, and achieved speedup factors from 3.7 to 5.7×. The reported performance improvement of tgMC3 is significant and appears to scale well with increasing dataset sizes. In addition, the strategy proposed in tgMC3 could benefit the acceleration of other Bayesian-based phylogenetic analysis methods using GPUs.
Systematic Biology | 2016
Douglas Chesters
Abstract Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene‐rich (genomic) and species‐rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species‐level phylogeny of insects (Class Insecta). Matrix‐building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species‐rich markers, whereas tree‐building requires hierarchical inference in order to capture species‐breadth while retaining deep‐level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep‐level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter‐ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene‐tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree‐based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species‐level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications.
Systematic Biology | 2014
Douglas Chesters; Chao-Dong Zhu
Public DNA databases are composed of data from many different taxa, although the taxonomic annotation on sequences is not always complete, which impedes the utilization of mined data for species-level applications. There is much ongoing work on species identification and delineation based on the molecular data itself, although applying species clustering to whole databases requires consolidation of results from numerous undefined gene regions, and introduces significant obstacles in data organization and computational load. In the current paper, we demonstrate an approach for species delineation of a sequence database. All DNA sequences for the insects were obtained and processed. After filtration of duplicated data, delineation of the database into species or molecular operational taxonomic units (MOTUs) followed a three-step process in which (i) the genetic loci L are partitioned, (ii) the species S are delineated within each locus, then (iii) species units are matched across loci to form the matrix L × S, a set of global (multilocus) species units. Partitioning the database into a set of homologous gene fragments was achieved by Markov clustering using edge weights calculated from the amount of overlap between pairs of sequences, then delineation of species units and assignment of species names were performed for the set of genes necessary to capture most of the species diversity. The complexity of computing pairwise similarities for species clustering was substantial at the cytochrome oxidase subunit I locus in particular, but made feasible through the development of software that performs pairwise alignments within the taxonomic framework, while accounting for the different ranks at which sequences are labeled with taxonomic information. Over 24 different homologs, the unidentified sequences numbered approximately 194,000, containing 41,525 species IDs (98.7% of all found in the insect database), and were grouped into 59,173 single-locus MOTUs by hierarchical clustering under parameters optimized independently for each locus. Species units from different loci were matched using a multipartite matching algorithm to form multilocus species units with minimal incongruence between loci. After matching, the insect database as represented by these 24 loci was found to be composed of 78,091 species units in total. 38,574 of these units contained only species labeled data, 34,891 contained only unlabeled data, leaving 4,626 units composed both of labeled and unlabeled sequences. In addition to giving estimates of species diversity of sequence repositories, the protocol developed here will facilitate species-level applications of modern-day sequence data sets. In particular, the L × S matrix represents a post-taxonomic framework that can be used for species-level organization of metagenomic data, and incorporation of these methods into phylogenetic pipelines will yield matrices more representative of species diversity.