Daniel Rokhsar
Lawrence Berkeley National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Rokhsar.
Nucleic Acids Research | 2012
Igor V. Grigoriev; Henrik Nordberg; Igor Shabalov; Andrea Aerts; Mike Cantor; David M. Goodstein; Alan Kuo; Simon Minovitsky; Roman Nikitin; Robin A. Ohm; Robert Otillar; Alexander Poliakov; Igor Ratnere; Robert Riley; Tatyana Smirnova; Daniel Rokhsar; Inna Dubchak
The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web. The JGI Genome Portal (http://genome.jgi.doe.gov) provides a unified access point to all JGI genomic databases and analytical tools. A user can find all DOE JGI sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. We describe here the general organization of the Genome Portal and the most recent addition, MycoCosm (http://jgi.doe.gov/fungi), a new integrated fungal genomics resource.
Current Opinion in Structural Biology | 1998
Vijay S. Pande; Alexander Y. Grosberg; Toyoichi Tanaka; Daniel Rokhsar
Theoretical studies using simplified models of proteins have shed light on the general heteropolymeric aspects of the folding problem. Recent work has emphasized the statistical aspects of folding pathways. In particular, progress has been made in characterizing the ensemble of transition state conformations and elucidating the role of intermediates. These advances suggest a reconciliation between the new ensemble approaches and the classical view of a folding pathway.
Neuron | 1997
Marla B. Feller; Daniel A. Butts; Holly L. Aaron; Daniel Rokhsar; Carla J. Shatz
In the developing mammalian retina, spontaneous waves of action potentials are present in the ganglion cell layer weeks before vision. These waves are known to be generated by a synaptically connected network of amacrine cells and retinal ganglion cells, and exhibit complex spatiotemporal patterns, characterized by shifting domains of coactivation. Here, we present a novel dynamical model consisting of two coupled populations of cells that quantitatively reproduces the experimentally observed domain sizes, interwave intervals, and wavefront velocity profiles. Model and experiment together show that the highly correlated activity generated by retinal waves can be explained by a combination of random spontaneous activation of cells and the past history of local retinal activity.
ieee international conference on high performance computing data and analytics | 2014
Evangelos Georganas; Aydin Buluç; Jarrod Chapman; Leonid Oliker; Daniel Rokhsar; Katherine A. Yelick
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93×. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15,360 cores of a Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.
ieee international conference on high performance computing data and analytics | 2015
Evangelos Georganas; Aydin Buluç; Jarrod Chapman; Steven A. Hofmeyr; Chaitanya Aluru; Rob Egan; Leonid Oliker; Daniel Rokhsar; Katherine A. Yelick
De novo whole genome assembly reconstructs genomic sequences from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMer, the first high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. First, we significantly improve scalability of parallel k-mer analysis for complex repetitive genomes that exhibit skewed frequency distributions. Next, we optimize the traversal of the de Bruijn graph of k-mers by employing a novel communication-avoiding parallel algorithm in a variety of use-case scenarios. Finally, we parallelize the Meraculous scaffolding modules by leveraging the one-sided communication capabilities of the Unified Parallel C while effectively mitigating load imbalance. Large-scale results on a Cray XC30 using grand-challenge genomes demonstrate efficient performance and scalability on thousands of cores. Overall, our pipeline accelerates Meraculous performance by orders of magnitude, enabling the complete assembly of the human genome in just 8.4 minutes on 15K cores of the Cray XC30, and creating unprecedented capability for extreme-scale genomic analysis.
international parallel and distributed processing symposium | 2015
Evangelos Georganas; Aydin Buluç; Jarrod Chapman; Leonid Oliker; Daniel Rokhsar; Katherine A. Yelick
Aligning a set of query sequences to a set of target sequences is an important task in bioinformatics. In this work we present merAligner, a highly parallel sequence aligner that implements a seed -- and -- extend algorithm and employs parallelism in all of its components. MerAligner relies on a high performance distributed hash table (seed index) and uses one-sided communication capabilities of the Unified Parallel C to facilitate a fine-grained parallelism. We leverage communication optimizations at the construction of the distributed hash table and software caching schemes to reduce communication during the aligning phase. Additionally, merAligner preprocesses the target sequences to extract properties enabling exact sequence matching with minimal communication. Finally, we efficiently parallelize the I/O intensive phases and implement an effective load balancing scheme. Results show that merAligner exhibits efficient scaling up to thousands of cores on a Cray XC30 supercomputer using real human and wheat genome data while significantly outperforming existing parallel alignment tools.
bioinformatics and biomedicine | 2014
Veronika Strnadova; Aydin Buluç; Jarrod Chapman; John R. Gilbert; Joseph E. Gonzalez; Stefanie Jegelka; Daniel Rokhsar; Leonid Oliker
High-throughput “next generation” genome sequencing technologies are producing a flood of inexpensive genetic information that is invaluable to genomics research. Sequences of millions of genetic markers are being produced, providing genomics researchers with the opportunity to construct highresolution genetic maps for many complicated genomes. However, the current generation of genetic mapping tools were designed for the small data setting, and are now limited by the prohibitively slow clustering algorithms they employ in the genetic marker-clustering stage. In this work, we present a new approach to genetic mapping based on a fast clustering algorithm that exploits the geometry of the data. Our theoretical and empirical analysis shows that the algorithm can correctly recover linkage groups. Using synthetic and real-world data, including the grand-challenge wheat genome, we demonstrate that our approach can quickly process orders of magnitude more genetic markers than existing tools while retaining - and in some cases even improving - the quality of genetic marker clusters.
Archive | 2005
Brett M. Tyler; Sucheta Tripathi; Andrea Aerts; Douda Bensasson; Paramvir Dehal; Inna Dubchak; Matteo Garbelotto; Mark Gijzen; Wayne Huang; Kelly Ivors; Rays H.Y. Jiang; Sophien Kamoun; Konstantinos Krampis; Kurt Lamour; Hayes McDonald; Mónica Medina; Paul Morris; Nik Putnam; Sam Rash; Asaf Salamov; Brian M. Smith; Joe Smith; Astrid Terry; Trudy A. Torto; Igor V. Grigoriev; Daniel Rokhsar; Jeffrey L. Boore
The approximately 60 species of Phytophthora are all destructive pathogens, causing rots of roots, stems, leaves and fruits of a wide range of agriculturally and ornamentally important plants (1). Some species, such as P. cinnamomi, P. parasitica and P. cactorum, each attack hundreds of different plant host species, whereas others are more restricted. Some of the crops where Phytophthora infections cause the greatest financial losses include potato, soybean, tomato, alfalfa, tobacco, peppers, cucurbits, pineapple, strawberry, raspberry and a wide range of perennial tree crops, especially citrus, avocado, almonds, walnuts, apples and cocoa, and they also heavily affect the ornamental, nursery and forestry industries. The economic damage overall to crops in the United States by Phytophthora species is estimated in the tens of billions of dollars, including the costs of control measures, and worldwide it is many times this amount (1). In the northern midwest of the U.S., P. sojae causes
The Journal of Neuroscience | 1999
Daniel A. Butts; Marla B. Feller; Carla J. Shatz; Daniel Rokhsar
200 million in annual losses to soybean alone, and worldwide causes around
The Journal of Neuroscience | 2001
Daniel A. Butts; Daniel Rokhsar
1-2 billion in losses per year. P. infestans infections resulted in the Irish potato famine last century and continues to be a difficult and worsening problem for potato and tomato growers worldwide, with worldwide costs estimated at