Arun S. Seetharam
Iowa State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arun S. Seetharam.
Nucleic Acids Research | 2015
Chaoyou Xue; Arun S. Seetharam; Olga Musharova; Konstantin Severinov; Stan J. J. Brouns; Andrew J. Severin; Dipali G. Sashital
CRISPR–Cas (clustered regularly interspaced short palindromic repeats-CRISPR associated) systems allow bacteria to adapt to infection by acquiring ‘spacer’ sequences from invader DNA into genomic CRISPR loci. Cas proteins use RNAs derived from these loci to target cognate sequences for destruction through CRISPR interference. Mutations in the protospacer adjacent motif (PAM) and seed regions block interference but promote rapid ‘primed’ adaptation. Here, we use multiple spacer sequences to reexamine the PAM and seed sequence requirements for interference and priming in the Escherichia coli Type I-E CRISPR–Cas system. Surprisingly, CRISPR interference is far more tolerant of mutations in the seed and the PAM than previously reported, and this mutational tolerance, as well as priming activity, is highly dependent on spacer sequence. We identify a large number of functional PAMs that can promote interference, priming or both activities, depending on the associated spacer sequence. Functional PAMs are preferentially acquired during unprimed ‘naïve’ adaptation, leading to a rapid priming response following infection. Our results provide numerous insights into the importance of both spacer and target sequences for interference and priming, and reveal that priming is a major pathway for adaptation during initial infection.
PeerJ | 2013
Arun S. Seetharam; Gary W. Stuart
Type IIB restriction endonucleases are site-specific endonucleases that cut both strands of double-stranded DNA upstream and downstream of their recognition sequences. These restriction enzymes have recognition sequences that are generally interrupted and range from 5 to 7 bases long. They produce DNA fragments which are uniformly small, ranging from 21 to 33 base pairs in length (without cohesive ends). The fragments are generated from throughout the entire length of a genomic DNA providing an excellent fractional representation of the genome. In this study we simulated restriction enzyme digestions on 21 sequenced genomes of various Drosophila species using the predicted targets of 16 Type IIB restriction enzymes to effectively produce a large and arbitrary selection of loci from these genomes. The fragments were then used to compare organisms and to calculate the distance between genomes in pair-wise combination by counting the number of shared fragments between the two genomes. Phylogenetic trees were then generated for each enzyme using this distance measure and the consensus was calculated. The consensus tree obtained agrees well with the currently accepted tree for the Drosophila species. We conclude that multi-locus sub-genomic representation combined with next generation sequencing, especially for individuals and species without previous genome characterization, can accelerate studies of comparative genomics and the building of accurate phylogenetic trees.
BMC Genomics | 2013
Arun S. Seetharam; Gary W. Stuart
BackgroundThe C2H2 zinc-finger (ZNF) containing gene family is one of the largest and most complex gene families in metazoan genomes. These genes are known to exist in almost all eukaryotes, and they constitute a major subset of eukaryotic transcription factors. The genes of this family usually occur as clusters in genomes and are thought to have undergone a massive expansion in vertebrates by multiple tandem duplication events (BMC Evol Biol 8:176, 2008).ResultsIn this study, we combined two popular approaches for homolog detection, Reciprocal Best Hit (RBH) (Proc Natl Acad Sci USA 95:6239–6244, 1998) and Hidden–Markov model (HMM) profiles search (Bioinformatics 14:755-763, 1998), on a diverse set of complete genomes of 124 eukaryotic species ranging from excavates to humans to identify all detectable members of 37 C2H2 ZNF gene families. We succeeded in identifying 3,890 genes as distinct members of 37 C2H2 gene families. These 37 families are distributed among the eukaryotes as progressive additions of gene blocks with increasing complexity of the organisms. The first block featuring the protists had 7 families, the second block featuring plants had 2 families, the third block featuring the fungi had 2 families (one of which was also present in plants) and the final block consisted of metazoans with 25 families. Among the metazoans, the simpler unicellular metazoans had just 15 of the 25 families while most of the bilaterians had all 25 families making up a total of 37 families. Multiple potential examples of lineage-specific gene duplications and gene losses were also observed.ConclusionsOur hybrid approach combines features of the both RBH and HMM methods for homolog detection. This largely automated technique is much faster than manual methods and is able to detect homologs accurately and efficiently among a diverse set of organisms. Our analysis of the 37 evolutionarily conserved C2H2 ZNF gene families revealed a stepwise appearance of ZNF families, agreeing well with the phylogenetic relationship of the organisms compared and their presumed stepwise increase in complexity (Science 300:1694, 2003).
PLOS ONE | 2016
Basudev Chowdhury; Arun S. Seetharam; Zhiping Wang; Yunlong Liu; Amy C. Lossie; Jyothi Thimmapuram; Joseph Irudayaraj
Cells alter their gene expression in response to exposure to various environmental changes. Epigenetic mechanisms such as DNA methylation are believed to regulate the alterations in gene expression patterns. In vitro and in vivo studies have documented changes in cellular proliferation, cytoskeletal remodeling, signal transduction, bone mineralization and immune deficiency under the influence of microgravity conditions experienced in space. However microgravity induced changes in the epigenome have not been well characterized. In this study we have used Next-generation Sequencing (NGS) to profile ground-based “simulated” microgravity induced changes on DNA methylation (5-methylcytosine or 5mC), hydroxymethylation (5-hydroxymethylcytosine or 5hmC), and simultaneous gene expression in cultured human lymphoblastoid cells. Our results indicate that simulated microgravity induced alterations in the methylome (~60% of the differentially methylated regions or DMRs are hypomethylated and ~92% of the differentially hydroxymethylated regions or DHMRs are hyperhydroxymethylated). Simulated microgravity also induced differential expression in 370 transcripts that were associated with crucial biological processes such as oxidative stress response, carbohydrate metabolism and regulation of transcription. While we were not able to obtain any global trend correlating the changes of methylation/ hydroxylation with gene expression, we have been able to profile the simulated microgravity induced changes of 5mC over some of the differentially expressed genes that includes five genes undergoing differential methylation over their promoters and twenty five genes undergoing differential methylation over their gene-bodies. To the best of our knowledge, this is the first NGS-based study to profile epigenomic patterns induced by short time exposure of simulated microgravity and we believe that our findings can be a valuable resource for future explorations.
BMC Research Notes | 2012
Arun S. Seetharam; Gary W. Stuart
BackgroundReconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD) to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees.ResultsAn unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed.ConclusionsThese results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between Drosophila species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.
ACS Synthetic Biology | 2017
Mingfeng Cao; Arun S. Seetharam; Andrew J. Severin; Zengyi Shao
Centromeres (CENs) are the chromosomal regions promoting kinetochore formation for faithful chromosome segregation. In yeasts, CENs have been recognized as the essential elements for extra-chromosomal DNA stabilization. However, the epigeneticity of CENs makes their localization on individual chromosomes very challenging, especially in many not well-studied nonconventional yeast species. Previously, we applied a stepwise method to identify a 500-bp CEN5 from Scheffersomyces stipitis chromosome 5 and experimentally confirmed its critical role on improving plasmid stability. Here we report a library-based strategy that integrates in silico GC3 chromosome scanning and high-throughput functional screening, which enabled the isolation of all eight S. stipitis centromeres with a 16 000-fold reduction in sequence very efficiently. Further identification of a 125-bp CEN core sequence that appears multiple times on each chromosome but all in the unique signature GC3-valley indicates that CEN location might be accurately discerned by their local GC3 percentages in a subgroup of yeasts.
Proceedings of the 2015 XSEDE Conference on Scientific Advancements Enabled by Enhanced Cyberinfrastructure | 2015
Arun S. Seetharam; Antonio Gomez; Catherine M. Purcell; John R. Hyde; Philip D. Blood; Andrew J. Severin
The development of genomic resources of non-model organisms is now becoming commonplace as the cost of sequencing continues to decrease. The Genome Informatics Facility in collaboration with the Southwest Fisheries Science Center (SWFSC), NOAA is creating these resources for sustainable aquaculture in Seriola lalandi. Gene prediction and annotation are common steps in the pipeline to generate genomic resources, which are computationally intense and time consuming. In our steps to create genomic resources for Seriola lalandi, we found BLAST to be one of our most rate limiting steps. Therefore, we took advantage of our XSEDE Extended Collaborative Support Services (ECSS) to reduce the amount of time required to process our transcriptome data by 300 percent. In this paper, we describe an optimized method for the BLAST tool on the Stampede cluster, which works with any existing datasets or database, without any modification. At modest core counts, our results are similar to the MPI-enabled BLAST algorithm (mpiBLAST), but also allow the much needed and improved flexibility of output formats that the latest versions of BLAST provide. Reducing this time-consuming bottleneck in BLAST will be broadly applicable to the annotation of large sequencing datasets for any organism.
bioRxiv | 2018
Zebulun W. Arendsee; Jing Li; Urminder Singh; Arun S. Seetharam; Karin S. Dorman; Eve Syrkin Wurtele
Motivation The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. Currently, there are no general pipelines for this task. We present an R package, phylostratr, to fill this gap, making high-quality phylostratigraphic analysis accessible to non-specialists. Results Phylostratigraphic analysis entails searching for homologs within increasingly broad clades. The highest clade that contains all homologs of a gene is that gene’s phylostratum. We have created a general R-based framework, phylostratr, for estimating the phylostratum of every gene in a species. The program can fully automate an analysis: select species for a balanced representation of each strata, retrieve the sequences from UniProt, build BLAST databases, run BLAST, infer homologs for each gene against each subject species, determine phylostrata, and return summaries and diagnostics. phylostratr allows extensive customization. A user may: modify the automatically-generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. phylostratr also offers proteome quality assessments, false-positive diagnostics, and checks for missing organelle genomes. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. Availability phylostratr source code and vignettes are available on GitHub at https://github.com/arendsee/phylostratr Contact [email protected]
bioRxiv | 2018
Rick E. Masonbrink; Thomas R. Maier; Usha Muppirala; Arun S. Seetharam; Etienne Lord; Parijat S. Juvale; Jeremy Schmutz; Nathan T. Johnson; Dmitry Korkin; Melissa G. Mitchum; Benjamin Mimee; Sebastian Eves-van den Akker; Matthew E. Hudson; Andrew J. Severin; Thomas J. Baum
Heterodera glycines, commonly referred to as the soybean cyst nematode (SCN), is an obligatory and sedentary plant parasite that causes over a billion-dollar yield loss to soybean production annually. Although there are genetic determinants that render soybean plants resistant to certain nematode genotypes, resistant soybean cultivars are increasingly ineffective because their multi-year usage has selected for virulent H. glycines populations. The parasitic success of H. glycines relies on the comprehensive re-engineering of an infection site into a syncytium, as well as the long-term suppression of host defense to ensure syncytial viability. At the forefront of these complex molecular interactions are effectors, the proteins secreted by H. glycines into host root tissues. The mechanisms of effector acquisition, diversification, and selection need to be understood before effective control strategies can be developed, but the lack of an annotated genome has been a major roadblock. Here, we use PacBio long-read technology to assemble a H. glycines genome of 738 contigs into 123Mb with annotations for 29,769 genes. The genome contains significant numbers of repeats (34%), tandem duplicates (18.7Mb), and horizontal gene transfer events (151 genes). Using previously published effector sequences, the newly generated H. glycines genome, and comparisons to other nematode genomes, we investigate the evolutionary mechanisms responsible for the emergence and diversification of effector genes.
Molecular Biology and Evolution | 2018
Joshua T. Trujillo; Arun S. Seetharam; Matthew B. Hufford; Mark A. Beilstein; Rebecca A. Mosher
Abstract Gene duplication is an important driver for the evolution of new genes and protein functions. Duplication of DNA-dependent RNA polymerase (Pol) II subunits within plants led to the emergence of RNA Pol IV and V complexes, each of which possess unique functions necessary for RNA-directed DNA Methylation. Comprehensive identification of Pol V subunit orthologs across the monocot radiation revealed a duplication of the largest two subunits within the grasses (Poaceae), including critical cereal crops. These paralogous Pol subunits display sequence conservation within catalytic domains, but their carboxy terminal domains differ in length and character of the Ago-binding platform, suggesting unique functional interactions. Phylogenetic analysis of the catalytic region indicates positive selection on one paralog following duplication, consistent with retention via neofunctionalization. Positive selection on residue pairs that are predicted to interact between subunits suggests that paralogous subunits have evolved specific assembly partners. Additional Pol subunits as well as Pol-interacting proteins also possess grass-specific paralogs, supporting the hypothesis that a novel Pol complex with distinct function has evolved in the grass family, Poaceae.