Nathan T. Weeks
Iowa State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nathan T. Weeks.
Molecular Plant | 2012
Himabindu Kudapa; Arvind K. Bharti; Steven B. Cannon; Andrew D. Farmer; Benjamin Mulaosmanovic; Robin Kramer; Abhishek Bohra; Nathan T. Weeks; John A. Crow; Reetu Tuteja; Trushar Shah; Sutapa Dutta; Deepak K. Gupta; Archana Singh; Kishor Gaikwad; T. R. Sharma; Gregory D. May; Nagendra K. Singh; Rajeev K. Varshney
A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ∼8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
Nucleic Acids Research | 2016
Sudhansu Dash; Jacqueline D. Campbell; Ethalinda K. S. Cannon; Alan M. Cleary; Wei Huang; Scott R. Kalberer; Vijay Karingula; Alex G. Rice; Jugpreet Singh; Pooja E. Umale; Nathan T. Weeks; Andrew P. Wilkey; Andrew D. Farmer; Steven B. Cannon
Legume Information System (LIS), at http://legumeinfo.org, is a genomic data portal (GDP) for the legume family. LIS provides access to genetic and genomic information for major crop and model legumes. With more than two-dozen domesticated legume species, there are numerous specialists working on particular species, and also numerous GDPs for these species. LIS has been redesigned in the last three years both to better integrate data sets across the crop and model legumes, and to better accommodate specialized GDPs that serve particular legume species. To integrate data sets, LIS provides genome and map viewers, holds synteny mappings among all sequenced legume species and provides a set of gene families to allow traversal among orthologous and paralogous sequences across the legumes. To better accommodate other specialized GDPs, LIS uses open-source GMOD components where possible, and advocates use of common data templates, formats, schemas and interfaces so that data collected by one legume research community are accessible across all legume GDPs, through similar interfaces and using common APIs. This federated model for the legumes is managed as part of the ‘Legume Federation’ project (accessible via http://legumefederation.org), which can be thought of as an umbrella project encompassing LIS and other legume GDPs.
extreme science and engineering discovery environment | 2013
Lars Koesterke; Kent Milfeld; Matthew W. Vaughn; Dan Stanzione; James E. Koltes; Nathan T. Weeks; James M. Reecy
The PCIT method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re-implemented with exemplary parallel, vector, I/O, memory and instruction optimizations for todays multi- and many-core architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer, but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component comprised of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi.
Concurrency and Computation: Practice and Experience | 2014
Lars Koesterke; James E. Koltes; Nathan T. Weeks; Kent Milfeld; Matthew W. Vaughn; James M. Reecy; Dan Stanzione
The partial correlation coefficient with information theory (PCIT) method is an important technique for detecting interactions between networks. The PCIT algorithm has been used in the biological context to infer complex regulatory mechanisms and interactions in genetic networks, in genome wide association studies, and in other similar problems. In this work, the PCIT algorithm is re‐implemented with exemplary parallel, vector, input/output (I/O), memory, and instruction optimizations for todays multi‐core and many‐core architectures. The evolution and performance of the new code targets the processor architectures of the Stampede supercomputer but will also benefit other architectures. The Stampede system consists of an Intel Xeon E5 processor base system with an innovative component consist of Intel Xeon Phi Coprocessors. Optimized results and an analysis are presented for both the Xeon and the Xeon Phi. Copyright
International Journal of High Performance Computing Applications | 2018
Nathan T. Weeks; Glenn R. Luecke; Brandon M. Groth; Marina Kraeva; Li Ma; Luke M Kramer; James E. Koltes; James M. Reecy
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated.
Scientific Reports | 2016
Vikas Belamkar; Andrew D. Farmer; Nathan T. Weeks; Scott R. Kalberer; William J. Blackmon; Steven B. Cannon
For species with potential as new crops, rapid improvement may be facilitated by new genomic methods. Apios (Apios americana Medik.), once a staple food source of Native American Indians, produces protein-rich tubers, tolerates a wide range of soils, and symbiotically fixes nitrogen. We report the first high-quality de novo transcriptome assembly, an expression atlas, and a set of 58,154 SNP and 39,609 gene expression markers (GEMs) for characterization of a breeding collection. Both SNPs and GEMs identify six genotypic clusters in the collection. Transcripts mapped to the Phaseolus vulgaris genome–another phaseoloid legume with the same chromosome number–provide provisional genetic locations for 46,852 SNPs. Linkage disequilibrium decays within 10 kb (based on the provisional genetic locations), consistent with outcrossing reproduction. SNPs and GEMs identify more than 21 marker-trait associations for at least 11 traits. This study demonstrates a holistic approach for mining plant collections to accelerate crop improvement.
trust, security and privacy in computing and communications | 2015
Glenn R. Luecke; Nathan T. Weeks; Brandon M. Groth; Marina Kraeva; L i Ma; Luke M. Kramer; James E. Koltes; James M. Reecy
epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) that affect quantitative traits in genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally-expensive analysis on data sets with many quantitative traits and markers. However, the explosion in genome sequencing will lead to the creation of large-scale data sets that will overwhelm EPISNPmpis ability to compute results in a reasonable amount of time. Thus, epiSNP was rewritten to efficiently handle these large data sets. This was accomplished by performing serial optimizations, improving MPI load balancing, and introducing parallel OpenMP directives to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). These additions resulted in new scalable versions of epiSNP using MPI, MPI+OpenMP, and MPI+OpenMP with one or two MICs. For a large 774,660 SNP data set with 1,634 individuals, the runtime on 126 nodes of TACCs Stampede Supercomputer was 10.61 minutes without MICs, and 5.13 minutes with 2 MICs. This translated to speedups over EPISNPmpi of 17X without MICs, and 36X with 2 MICs.
Cluster Computing | 2017
Nathan T. Weeks; Glenn R. Luecke
SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file.
european conference on parallel processing | 2016
Nathan T. Weeks; Glenn R. Luecke
SAMtools is a suite of tools that is widely-used in genomics workflows for post-processing sequence alignment data from large high-throughput sequencing data sets. A common use of SAMtools is to sort the standard Binary Alignment/Map (BAM) format emitted by many sequence aligners. This can be computationally- and I/O-intensive: BAM files can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper presents a case study on the performance characterization and optimization of BAM sorting with SAMtools. OpenMP task parallelism to enhance concurrency and memory optimization techniques were employed in both SAMtools and the underlying library HTSlib. Utilizing all 32 processor cores on the benchmark system, the optimizations resulted in a speedup of 3.92X for an in-memory sort of 24.6 GiB of BAM data (102.6 GiB uncompressed), while a 1.55X speedup was achieved for an out-of-core sort.
Concurrency and Computation: Practice and Experience | 2014
Nathan T. Weeks; Marina Kraeva; Glenn R. Luecke
Concurrent programming has become a common means to harness the potential performance of multi‐core processors. System V (SysV) message queues and semaphores have been used since the mid 1970s to implement inter‐process concurrency, but they are difficult to use, and bindings exist for few programming languages. This paper introduces ipcmd, a high‐level command‐line interface to SysV message queues and semaphores. ipcmd provides an easy‐to‐use interface for synchronizing concurrent processes to allow application developers to efficiently prototype, debug, and test the use of SysV semaphores and message queues in applications. Easy‐to‐understand applications of semaphores are illustrated using simple shell scripts. Copyright