Frank Alex Feltus | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank Alex Feltus is active.

Explore More

Publication

Featured researches published by Frank Alex Feltus.

Genome Biology | 2013

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

Juan Carlos Motamayor; Keithanne Mockaitis; Jeremy Schmutz; Niina Haiminen; Donald Livingstone; Omar E. Cornejo; Seth D. Findley; Ping Zheng; Filippo Utro; Stefan Royaert; Christopher A. Saski; Jerry Jenkins; Ram Podicheti; Meixia Zhao; Brian E. Scheffler; Joseph C Stack; Frank Alex Feltus; Guiliana Mustiga; Freddy Amores; Wilbert Phillips; Jean Philippe Marelli; Gregory D. May; Howard Shapiro; Jianxin Ma; Carlos Bustamante; Raymond J. Schnell; Dorrie Main; Don Gilbert; Laxmi Parida; David N. Kuhn

BackgroundTheobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.ResultsWe describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation.ConclusionsWe report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

Biotechnology for Biofuels | 2012

Bioenergy grass feedstock: current options and prospects for trait improvement using emerging genetic, genomic, and systems biology toolkits

Frank Alex Feltus; Joshua P. Vandenbrink

For lignocellulosic bioenergy to become a viable alternative to traditional energy production methods, rapid increases in conversion efficiency and biomass yield must be achieved. Increased productivity in bioenergy production can be achieved through concomitant gains in processing efficiency as well as genetic improvement of feedstock that have the potential for bioenergy production at an industrial scale. The purpose of this review is to explore the genetic and genomic resource landscape for the improvement of a specific bioenergy feedstock group, the C4 bioenergy grasses. First, bioenergy grass feedstock traits relevant to biochemical conversion are examined. Then we outline genetic resources available bioenergy grasses for mapping bioenergy traits to DNA markers and genes. This is followed by a discussion of genomic tools and how they can be applied to understanding bioenergy grass feedstock trait genetic mechanisms leading to further improvement opportunities.

Database | 2013

Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases

Lacey-Anne Sanderson; Stephen P. Ficklin; Chun-Huai Cheng; Sook Jung; Frank Alex Feltus; Kirstin E. Bett; Dorrie Main

Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/

PLOS ONE | 2013

Massive-Scale Gene Co-Expression Network Construction and Robustness Testing Using Random Matrix Theory

Scott M. Gibson; Stephen P. Ficklin; Sven Isaacson; Feng Luo; Frank Alex Feltus; Melissa C. Smith

The study of gene relationships and their effect on biological function and phenotype is a focal point in systems biology. Gene co-expression networks built using microarray expression profiles are one technique for discovering and interpreting gene relationships. A knowledge-independent thresholding technique, such as Random Matrix Theory (RMT), is useful for identifying meaningful relationships. Highly connected genes in the thresholded network are then grouped into modules that provide insight into their collective functionality. While it has been shown that co-expression networks are biologically relevant, it has not been determined to what extent any given network is functionally robust given perturbations in the input sample set. For such a test, hundreds of networks are needed and hence a tool to rapidly construct these networks. To examine functional robustness of networks with varying input, we enhanced an existing RMT implementation for improved scalability and tested functional robustness of human (Homo sapiens), rice (Oryza sativa) and budding yeast (Saccharomyces cerevisiae). We demonstrate dramatic decrease in network construction time and computational requirements and show that despite some variation in global properties between networks, functional similarity remains high. Moreover, the biological function captured by co-expression networks thresholded by RMT is highly robust.

BMC Genomics | 2011

New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

Christopher A. Saski; Zhigang Li; Frank Alex Feltus; Hong Luo

BackgroundSwitchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome.ResultsA switchgrass BAC library constructed by partial digestion of nuclear DNA with Eco RI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy.ConclusionsThe construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy.

BMC Systems Biology | 2013

Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

Frank Alex Feltus; Stephen P. Ficklin; Scott M. Gibson; Melissa C. Smith

BackgroundIn genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium.ResultsA total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network.ConclusionsHere we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.

Functional Plant Biology | 2008

Transcriptome analysis of leaf tissue from Bermudagrass (Cynodon dactylon) using a normalised cDNA library

Changsoo Kim; Cheol Seong Jang; Terry L. Kamps; Jon S. Robertson; Frank Alex Feltus; Andrew H. Paterson

A normalised cDNA library was constructed from Bermudagrass to gain insight into the transcriptome of Cynodon dactylon L. A total of 15 588 high-quality expressed sequence tags (ESTs) from the cDNA library were subjected to The Institute for Genomic Research Gene Indices clustering tools to produce a unigene set. A total of 9414 unigenes were obtained from the high-quality ESTs and only 39.6% of the high-quality ESTs were redundant, indicating that the normalisation procedure was effective. A large-scale comparative genomic analysis of the unigenes was carried out using publicly available tools, such as BLAST, InterProScan and Gene Ontology. The unigenes were also subjected to a search for EST-derived simple sequence repeats (EST-SSRs) and conserved-intron scanning primers (CISPs), which are useful as DNA markers. Although the candidate EST-SSRs and CISPs found in the present study need to be empirically tested, they are expected to be useful as DNA markers for many purposes, including comparative genomic studies of grass species, by virtue of their significant similarities to EST sequences from other grasses. Thus, knowledge of Cynodon ESTs will empower turfgrass research by providing homologues for genes that are thought to confer important functions in other plants.

BMC Genomics | 2011

Sequencing of a QTL-rich region of the Theobroma cacao genome using pooled BACs and the identification of trait specific candidate genes

Frank Alex Feltus; Christopher A. Saski; Keithanne Mockaitis; Niina Haiminen; Laxmi Parida; Zachary D. Smith; James Ford; Margaret Staton; Stephen P. Ficklin; Barbara Blackmon; Chun-Huai Cheng; Raymond J. Schnell; David N. Kuhn; Juan-Carlos Motamayor

BackgroundBAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library.ResultsThis pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight.ConclusionsOur results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.

PLOS ONE | 2013

A Systems-Genetics Approach and Data Mining Tool to Assist in the Discovery of Genes Underlying Complex Traits in Oryza sativa

Stephen P. Ficklin; Frank Alex Feltus

Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.

Insect Molecular Biology | 2014

Studying the organization of genes encoding plant cell wall degrading enzymes in Chrysomela tremula provides insights into a leaf beetle genome.

Yannick Pauchet; Christopher A. Saski; Frank Alex Feltus; Isabelle Luyten; Hadi Quesneville; David G. Heckel

The ability of herbivorous beetles from the superfamilies Chrysomeloidea and Curculionoidea to degrade plant cell wall polysaccharides has only recently begun to be appreciated. The presence of plant cell wall degrading enzymes (PCWDEs) in the beetles digestive tract makes this degradation possible. Sequences encoding these beetle‐derived PCWDEs were originally identified from transcriptomes and strikingly resemble those of saprophytic and phytopathogenic microorganisms, raising questions about their origin; e.g. are they insect‐ or microorganism‐derived? To demonstrate unambiguously that the genes encoding PCWDEs found in beetle transcriptomes are indeed of insect origin, we generated a bacterial artificial chromosome library from the genome of the leaf beetle Chrysomela tremula, containing 18 432 clones with an average size of 143 kb. After hybridizing this library with probes derived from 12 C. tremula PCWDE‐encoding genes and sequencing the positive clones, we demonstrated that the latter genes are encoded by the insects genome and are surrounded by genes possessing orthologues in the genome of Tribolium castaneum as well as in three other beetle genomes. Our analyses showed that although the level of overall synteny between C. tremula and T. castaneum seems high, the degree of microsynteny between both species is relatively low, in contrast to the more closely related Colorado potato beetle.

Explore More