Michael R. Crusoe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael R. Crusoe is active.

Explore More

Publication

Featured researches published by Michael R. Crusoe.

F1000Research | 2015

The khmer software package: enabling efficient nucleotide sequence analysis

Michael R. Crusoe; Hussien Alameldin; Sherine Awad; Elmar Boucher; Adam Caldwell; Reed A. Cartwright; Amanda Charbonneau; Bede Constantinides; Greg Edvenson; Scott Fay; Jacob Fenton; Thomas Fenzl; Jordan A. Fish; Leonor Garcia-Gutierrez; Phillip Garland; Jonathan Gluck; Iván González; Sarah Guermond; Jiarong Guo; Aditi Gupta; Joshua R. Herr; Adina Howe; Alex Hyer; Andreas Härpfer; Luiz Irber; Rhys Kidd; David Lin; Justin Lippi; Tamer Mansour; Pamela McA'Nulty

The khmer package is a freely available software library for working efficiently with fixed length DNA words, or k-mers. khmer provides implementations of a probabilistic k-mer counting data structure, a compressible De Bruijn graph representation, De Bruijn graph partitioning, and digital normalization. khmer is implemented in C++ and Python, and is freely available under the BSD license at https://github.com/dib-lab/khmer/.

Journal of open research software | 2016

Walking the Talk: Adopting and Adapting Sustainable Scientific Software Development processes in a Small Biology Lab

Michael R. Crusoe; C. Brown

The khmer software project provides both research and production functionality for largescale nucleic-acid sequence analysis. The software implements several novel data structures and algorithms that perform data pre-fltering for common bioinformatics tasks, including sequence mapping and de novo assembly. Development is driven by a small lab with one full-time developer (MRC), as well as several graduate students and a professor (CTB) who contribute regularly to research features. Here we describe our efforts to bring better design, testing, and more open development to the khmer software project as of version 1.1. The khmer software is developed openly at http://github.com/dib-lab/khmer/.

Genome Biology and Evolution | 2018

Comparative Genomics Reveals Accelerated Evolution in Conserved Pathways during the Diversification of Anole Lizards

Marc Tollis; Elizabeth D. Hutchins; Jessica Stapley; Shawn M. Rupp; Walter L. Eckalbar; Inbar Maayan; Eris Lasku; Carlos R. Infante; Stuart R. Dennis; Joel Robertson; Catherine M. May; Michael R. Crusoe; Eldredge Bermingham; Dale F. DeNardo; Shi Tong Tonia Hsieh; Rob J. Kulathinal; William Owen McMillan; Douglas B. Menke; Stephen C. Pratt; Jeffery Alan Rawls; Oris Sanjur; Jeanne Wilson-Rawls; Melissa A. Wilson Sayres; Rebecca E. Fisher; Kenro Kusumi

Abstract Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation. Here, we sequence and assemble the draft genomes of three anole species—Anolis frenatus, Anolis auratus, and Anolis apletophallus—for comparison with the available reference genome of Anolis carolinensis. Comparative analyses reveal a rapid background rate of molecular evolution consistent with a model of punctuated equilibrium, and strong purifying selection on functional genomic elements in anoles. We find evidence for accelerated evolution in genes involved in behavior, sensory perception, and reproduction, as well as in genes regulating limb bud development and hindlimb specification. Morphometric analyses of anole fore and hindlimbs corroborated these findings. We detect signatures of positive selection across several genes related to the development and regulation of the forebrain, hormones, and the iguanian lizard dewlap, suggesting molecular changes underlying behavioral adaptations known to reinforce species boundaries were a key component in the diversification of anole lizards.

Journal of open research software | 2016

Channeling Community Contributions to Scientific Software: A Sprint Experience

Michael R. Crusoe; C. Titus Brown

In 2014, the khmer software project participated in a two-day global sprint coordinated by the Mozilla Science Lab. We offered a mentored experience in contributing to a scientific software project for anyone who was interested. We provided entry-level tasks and worked with contributors as they worked through our development process. The experience was successful on both a social and a technical level, bringing in 13 contributions from 9 new contributors and validating our development process. In this experience paper we describe the sprint preparation and process, relate anecdotal experiences, and draw conclusions about what other projects could do to enable a similar outcome. The khmer software is developed openly at http://github.com/dib-lab/khmer/.

AAS Open Research | 2018

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience

Azza Elgaili Ahmed; Phelelani T. Mpangase; Sumir Panji; Shakuntala Baichoo; Gerrit Botha; Faisal M. Fadlelmola; Scott Hazelhurst; Peter van Heusden; C. Victor Jongeneel; Fourie Joubert; Liudmila Sergeevna Mainzer; Ayton Meintjes; Don Armstrong; Michael R. Crusoe; Brian O'Connor; Yassine Souilmi; Mustafa Alghali; Shaun Aron; Hocine Bendou; Eugene De Beste; Mamana Mbiyavanga; Oussema Souiai; Long Yi; Jennie Zermeno; Nicola Mulder

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

bioRxiv | 2017

Enabling Precision Medicine via standard communication of NGS provenance, analysis, and results

Gil Alterovitz; Dennis A. Dean; Carole A. Goble; Michael R. Crusoe; Stian Soiland-Reyes; Amanda Bell; Anais Hayes; Charles Hadley King; Elaine Johanson; Elaine E. Thompson; Eric Donaldson; Hsinyi S Tsang; Jeremy Goecks; Jonas S. Almeida; Lydia Guo; Mark Walderhaug; Paul Walsh; Robel Kahsay; Toby Bloom; Yuching Lai; Vahan Simonyan; Raja Mazumder

Precision medicine can be empowered by a personalized approach to patient care based on the patient’s or pathogen’s unique genomic sequence. For precision medicine, genomic findings must be robust and reproducible, and experimental data capture should adhere to FAIR guiding principles. Moreover, precision medicine requires standardized reporting that extends beyond wet lab procedures to computational methods. Rapidly developing standardization technologies can improve communication and reporting of genomic sequence data analysis steps by utilizing concepts defined in the BioCompute framework, such as error domain, usability domain, verification kit, and provenance domain. These advancements allow data provenance to be standardized and promote interoperability. Thus, a resulting bioinformatics computation instance that includes these advancements can be easily communicated, repeated and compared by scientists, regulators, test developers and clinicians. Easing the burden of doing the aforementioned tasks greatly extends the range of practical application. Advancing clinical trials, precision medicine, and regulatory submissions requires an umbrella of standards that not only fuses these elements, but also ensures efficient communication and documentation of genomic analyses. The BioCompute paradigm and the resulting BioCompute Objects (BCOs) offer that umbrella. Through standardized bundling of high-throughput sequencing studies under BCOs, regulatory agencies (e.g., FDA), test developers, researchers, and clinicians can expand collaboration to drive innovation in precision medicine, with the potential for decreasing the time and cost associated with next generation sequencing workflow exchange, reporting, and regulatory reviews.

F1000Research | 2017

Using bio.tools to generate and annotate workbench tool descriptions

Kenzo-Hugo Hillion; Ivan Kuzmin; Anton Khodak; Eric Rasche; Michael R. Crusoe; Hedi Peterson; Jon Ison; Hervé Ménager

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.

Data Science and Engineering | 2017

Robust Cross-Platform Workflows: How Technical and Scientific Communities Collaborate to Develop, Test and Share Best Practices for Data Analysis

Steffen Möller; Stuart W. Prescott; Lars Wirzenius; Petter Reinholdtsen; Brad Chapman; Pjotr Prins; Stian Soiland-Reyes; Fabian Klötzl; Andrea Bagnacani; Matúš Kalaš; Andreas Tille; Michael R. Crusoe

AbstractInformation integration and workflow technologies for data analysis have always been major fields of investigation in bioinformatics. A range of popular workflow suites are available to support analyses in computational biology. Commercial providers tend to offer prepared applications remote to their clients. However, for most academic environments with local expertise, novel data collection techniques or novel data analysis, it is essential to have all the flexibility of open-source tools and open-source workflow descriptions. Workflows in data-driven science such as computational biology have considerably gained in complexity. New tools or new releases with additional features arrive at an enormous pace, and new reference data or concepts for quality control are emerging. A well-abstracted workflow and the exchange of the same across work groups have an enormous impact on the efficiency of research and the further development of the field. High-throughput sequencing adds to the avalanche of data available in the field; efficient computation and, in particular, parallel execution motivate the transition from traditional scripts and Makefiles to workflows. We here review the extant software development and distribution model with a focus on the role of integration testing and discuss the effect of common workflow language on distributions of open-source scientific software to swiftly and reliably provide the tools demanded for the execution of such formally described workflows. It is contended that, alleviated from technical differences for the execution on local machines, clusters or the cloud, communities also gain the technical means to test workflow-driven interaction across several software packages.

figshare | 2016

Common Workflow Language, v1.0

Peter Amstutz; Michael R. Crusoe; Nebojša Tijanić; Brad Chapman; John Chilton; Michael Heuer; Andrey V. Kartashov; Dan Leehr; Hervé Ménager; Maya Nedeljkovich; Matt Scales; Stian Soiland-Reyes; Luka Stojanovic

F1000Research | 2015