William G. Farmerie | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where William G. Farmerie is active.

Explore More

Publication

Featured researches published by William G. Farmerie.

The ISME Journal | 2007

Pyrosequencing enumerates and contrasts soil microbial diversity.

Luiz Fernando Wurdig Roesch; Roberta R. Fulthorpe; Alberto Riva; George Casella; Alison K M Hadwin; Angela D. Kent; Samira H. Daroub; Flávio Anastácio de Oliveira Camargo; William G. Farmerie; Eric W. Triplett

Estimates of the number of species of bacteria per gram of soil vary between 2000 and 8.3 million (Gans et al., 2005; Schloss and Handelsman, 2006). The highest estimate suggests that the number may be so large as to be impractical to test by amplification and sequencing of the highly conserved 16S rRNA gene from soil DNA (Gans et al., 2005). Here we present the use of high throughput DNA pyrosequencing and statistical inference to assess bacterial diversity in four soils across a large transect of the western hemisphere. The number of bacterial 16S rRNA sequences obtained from each site varied from 26 140 to 53 533. The most abundant bacterial groups in all four soils were the Bacteroidetes, Betaproteobacteria and Alphaproteobacteria. Using three estimators of diversity, the maximum number of unique sequences (operational taxonomic units roughly corresponding to the species level) never exceeded 52 000 in these soils at the lowest level of dissimilarity. Furthermore, the bacterial diversity of the forest soil was phylum rich compared to the agricultural soils, which are species rich but phylum poor. The forest site also showed far less diversity of the Archaea with only 0.009% of all sequences from that site being from this group as opposed to 4%–12% of the sequences from the three agricultural sites. This work is the most comprehensive examination to date of bacterial diversity in soil and suggests that agricultural management of soil may significantly influence the diversity of bacteria and archaea.

BMC Genomics | 2008

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome.

Evandro Novaes; Derek R. Drost; William G. Farmerie; Georgios Pappas; Dario Grattapaglia; Ronald R. Sederoff; Matias Kirst

BackgroundBenefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation.ResultsWith the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches.ConclusionIn providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

Proceedings of the National Academy of Sciences of the United States of America | 2011

The genome of the fire ant Solenopsis invicta

Yannick Wurm; John L. Wang; Miguel Corona; Sanne Nygaard; Brendan G. Hunt; Krista K. Ingram; Mingkwan Nipitwattanaphon; Dietrich Gotzek; Michiel B. Dijkstra; Jan Oettler; Fabien Comtesse; Cheng-Jen Shih; Wen-Jer Wu; Chin-Cheng Yang; Jérôme Thomas; Emmanuel Beaudoing; Sylvain Pradervand; Volker Flegel; Erin D. Cook; Roberto Fabbretti; Heinz Stockinger; Li Long; William G. Farmerie; Jane Oakey; Jacobus J. Boomsma; Pekka Pamilo; Soojin V. Yi; Jürgen Heinze; Michael A. D. Goodisman; Laurent Farinelli

Ants have evolved very complex societies and are key ecosystem members. Some ants, such as the fire ant Solenopsis invicta, are also major pests. Here, we present a draft genome of S. invicta, assembled from Roche 454 and Illumina sequencing reads obtained from a focal haploid male and his brothers. We used comparative genomic methods to obtain insight into the unique features of the S. invicta genome. For example, we found that this genome harbors four adjacent copies of vitellogenin. A phylogenetic analysis revealed that an ancestral vitellogenin gene first underwent a duplication that was followed by possibly independent duplications of each of the daughter vitellogenins. The vitellogenin genes have undergone subfunctionalization with queen- and worker-specific expression, possibly reflecting differential selection acting on the queen and worker castes. Additionally, we identified more than 400 putative olfactory receptors of which at least 297 are intact. This represents the largest repertoire reported so far in insects. S. invicta also harbors an expansion of a specific family of lipid-processing genes, two putative orthologs to the transformer/feminizer sex differentiation gene, a functional DNA methylation system, and a single putative telomerase ortholog. EST data indicate that this S. invicta telomerase ortholog has at least four spliceforms that differ in their use of two sets of mutually exclusive exons. Some of these and other unique aspects of the fire ant genome are likely linked to the complex social behavior of this species.

Nucleic Acids Research | 2009

ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences

Yijun Sun; Yunpeng Cai; Li Liu; Fahong Yu; Michael L. Farrell; William McKendree; William G. Farmerie

Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.

Cell | 2006

Neuronal Transcriptome of Aplysia: Neuronal Compartments and Circuitry

Leonid L. Moroz; John R. Edwards; Sathyanarayanan V. Puthanveettil; Andrea B. Kohn; Thomas Ha; Andreas Heyland; Bjarne Knudsen; Anuj Sahni; Fahong Yu; Li Liu; Sami Jezzini; Peter Lovell; William Iannucculli; Minchen Chen; Tuan Nguyen; Huitao Sheng; Regina Shaw; Sergey Kalachikov; Yuri V. Panchin; William G. Farmerie; James J. Russo; Jingyue Ju; Eric R. Kandel

Molecular analyses of Aplysia, a well-established model organism for cellular and systems neural science, have been seriously handicapped by a lack of adequate genomic information. By sequencing cDNA libraries from the central nervous system (CNS), we have identified over 175,000 expressed sequence tags (ESTs), of which 19,814 are unique neuronal gene products and represent 50%-70% of the total Aplysia neuronal transcriptome. We have characterized the transcriptome at three levels: (1) the central nervous system, (2) the elementary components of a simple behavior: the gill-withdrawal reflex-by analyzing sensory, motor, and serotonergic modulatory neurons, and (3) processes of individual neurons. In addition to increasing the amount of available gene sequences of Aplysia by two orders of magnitude, this collection represents the largest database available for any member of the Lophotrochozoa and therefore provides additional insights into evolutionary strategies used by this highly successful diversified lineage, one of the three proposed superclades of bilateral animals.

BMC Plant Biology | 2006

Rapid and accurate pyrosequencing of angiosperm plastid genomes.

Michael J. Moore; Amit Dhingra; Pamela S. Soltis; Regina Shaw; William G. Farmerie; Kevin M. Folta; Douglas E. Soltis

BackgroundPlastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae).ResultsMore than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions.ConclusionHighly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically.

Molecular Ecology Resources | 2009

Evaluating high‐throughput sequencing as a method for metagenomic analysis of nematode diversity

Dorota L. Porazinska; Robin M. Giblin-Davis; Lina Faller; William G. Farmerie; Natsumi Kanzaki; Krystalynne Morris; Thomas O. Powers; Abraham E. Tucker; Way Sung; W. Kelley Thomas

Nematodes play an important role in ecosystem processes, yet the relevance of nematode species diversity to ecology is unknown. Because nematode identification of all individuals at the species level using standard techniques is difficult and time‐consuming, nematode communities are not resolved down to the species level, leaving ecological analysis ambiguous. We assessed the suitability of massively parallel sequencing for analysis of nematode diversity from metagenomic samples. We set up four artificial metagenomic samples involving 41 diverse reference nematodes in known abundances. Two samples came from pooling polymerase chain reaction products amplified from single nematode species. Two additional metagenomic samples consisted of amplified products of DNA extracted from pooled nematode species. Amplified products involved two rapidly evolving ~400‐bp sections coding for the small and large subunit of rRNA. The total number of reads ranged from 4159 to 14771 per metagenomic sample. Of these, 82% were > 199 bp in length. Among the reads > 199 bp, 86% matched the referenced species with less than three nucleotide differences from a reference sequence. Although neither rDNA section recovered all nematode species, the use of both loci improved the detection level of nematode species from 90 to 97%. Overall, results support the suitability of massively parallel sequencing for identification of nematodes. In contrast, the frequency of reads representing individual species did not correlate with the number of individuals in the metagenomic samples, suggesting that further methodological work is necessary before it will be justified for inferring the relative abundances of species within a nematode community.

Bioinformatics | 2007

Improved breast cancer prognosis through the combination of clinical and genetic markers

Yijun Sun; Steve Goodison; Jian Li; Li Liu; William G. Farmerie

MOTIVATION Accurate prognosis of breast cancer can spare a significant number of breast cancer patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Recent studies have demonstrated the potential value of gene expression signatures in assessing the risk of post-surgical disease recurrence. However, these studies all attempt to develop genetic marker-based prognostic systems to replace the existing clinical criteria, while ignoring the rich information contained in established clinical markers. Given the complexity of breast cancer prognosis, a more practical strategy would be to utilize both clinical and genetic marker information that may be complementary. METHODS A computational study is performed on publicly available microarray data, which has spawned a 70-gene prognostic signature. The recently proposed I-RELIEF algorithm is used to identify a hybrid signature through the combination of both genetic and clinical markers. A rigorous experimental protocol is used to estimate the prognostic performance of the hybrid signature and other prognostic approaches. Survival data analyses is performed to compare different prognostic approaches. RESULTS The hybrid signature performs significantly better than other methods, including the 70-gene signature, clinical makers alone and the St. Gallen consensus criterion. At the 90% sensitivity level, the hybrid signature achieves 67% specificity, as compared to 47% for the 70-gene signature and 48% for the clinical makers. The odds ratio of the hybrid signature for developing distant metastases within five years between the patients with a good prognosis signature and the patients with a bad prognosis is 21.0 (95% CI:6.5-68.3), far higher than either genetic or clinical markers alone. AVAILABILITY The breast cancer dataset is available at www.nature.com and Matlab codes are available upon request.

Briefings in Bioinformatics | 2012

A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis

Yijun Sun; Yunpeng Cai; Susan M. Huse; Rob Knight; William G. Farmerie; Xiaoyu Wang; Volker Mai

Recent advances in massively parallel sequencing technology have created new opportunities to probe the hidden world of microbes. Taxonomy-independent clustering of the 16S rRNA gene is usually the first step in analyzing microbial communities. Dozens of algorithms have been developed in the last decade, but a comprehensive benchmark study is lacking. Here, we survey algorithms currently used by microbiologists, and compare seven representative methods in a large-scale benchmark study that addresses several issues of concern. A new experimental protocol was developed that allows different algorithms to be compared using the same platform, and several criteria were introduced to facilitate a quantitative evaluation of the clustering performance of each algorithm. We found that existing methods vary widely in their outputs, and that inappropriate use of distance levels for taxonomic assignments likely resulted in substantial overestimates of biodiversity in many studies. The benchmark study identified our recently developed ESPRIT-Tree, a fast implementation of the average linkage-based hierarchical clustering algorithm, as one of the best algorithms available in terms of computational efficiency and clustering accuracy.

BMC Plant Biology | 2005

Floral gene resources from basal angiosperms for comparative genomics research

Victor A. Albert; Douglas E. Soltis; John E. Carlson; William G. Farmerie; P. Kerr Wall; Daniel C. Ilut; Teri M Solow; Lukas A. Mueller; Lena Landherr; Yi Hu; Matyas Buzgo; Sangtae Kim; Mi-Jeong Yoo; Michael W. Frohlich; Rafael Perl-Treves; Scott E. Schlarbaum; Barbara J Bliss; Xiaohong Zhang; Steven D. Tanksley; David G. Oppenheimer; Pamela S. Soltis; Hong Ma; Claude W. dePamphilis; Jim Leebens-Mack

BackgroundThe Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants.ResultsRandom sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms.ConclusionInitial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways.

Explore More