Disa Mhembere
Johns Hopkins University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Disa Mhembere.
ieee global conference on signal and information processing | 2013
William Gray Roncal; Zachary H. Koterba; Disa Mhembere; Dean M. Kleissas; Joshua T. Vogelstein; Randal C. Burns; Anita R. Bowles; Dimitrios K. Donavos; Sephira G. Ryman; Rex E. Jung; Lei Wu; Vince D. Calhoun; R. Jacob Vogelstein
Currently, connectomes (e.g., functional or structural brain graphs) can be estimated in humans at ≈ 1 mm3 scale using a combination of diffusion weighted magnetic resonance imaging, functional magnetic resonance imaging and structural magnetic resonance imaging scans. This manuscript summarizes a novel, scalable implementation of open-source algorithms to rapidly estimate magnetic resonance connectomes, using both anatomical regions of interest (ROIs) and voxel-size vertices. To assess the reliability of our pipeline, we develop a novel non-parametric non-Euclidean reliability metric. Here we provide an overview of the methods used, demonstrate our implementation, and discuss available user extensions. We conclude with results showing the efficacy and reliability of the pipeline over previous state-of-the-art.
ieee global conference on signal and information processing | 2013
Disa Mhembere; William Gray Roncal; Daniel L. Sussman; Carey E. Priebe; Rex E. Jung; Sephira G. Ryman; R. Jacob Vogelstein; Joshua T. Vogelstein; Randal C. Burns
Graphs are quickly emerging as a leading abstraction for the representation of data. One important application domain originates from an emerging discipline called “connectomics”. Connectomics studies the brain as a graph; vertices correspond to neurons (or collections thereof) and edges correspond to structural or functional connections between them. To explore the variability of connectomes-to address both basic science questions regarding the structure of the brain, and medical health questions about psychiatry and neurology-one can study the topological properties of these brain-graphs. We define multivariate glocal graph invariants: these are features of the graph that capture various local and global topological properties of the graphs. We show that the collection of features can collectively be computed via a combination of daisy-chaining, sparse matrix representation and computations, and efficient approximations. Our custom open-source Python package serves as a back-end to a Web-service that we have created to enable researchers to upload graphs, and download the corresponding invariants in a number of different formats. Moreover, we built this package to support distributed processing on multicore machines. This is therefore an enabling technology for network science, lowering the barrier of entry by providing tools to biologists and analysts who otherwise lack these capabilities. As a demonstration, we run our code on 120 brain-graphs, each with approximately 16M vertices and up to 90M edges.
IEEE Transactions on Parallel and Distributed Systems | 2017
Da Zheng; Disa Mhembere; Vince Lyzinski; Joshua T. Vogelstein; Carey E. Priebe; Randal C. Burns
Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse matrix dense matrix multiplication (SpMM) in a semi-external memory (SEM) fashion; i.e., we keep the sparse matrix on commodity SSDs and dense matrices in memory. Our SEM-SpMM incorporates many in-memory optimizations for large power-law graphs. It outperforms the in-memory implementations of Trilinos and Intel MKL and scales to billion-node graphs, far beyond the limitations of memory. Furthermore, on a single large parallel machine, our SEM-SpMM operates as fast as the distributed implementations of Trilinos using five times as much processing power. We also run our implementation in memory (IM-SpMM) to quantify the overhead of keeping data on SSDs. SEM-SpMM achieves almost 100 percent performance of IM-SpMM on graphs when the dense matrix has more than four columns; it achieves at least 65 percent performance of IM-SpMM on all inputs. We apply our SpMM to three important data analysis tasks—PageRank, eigensolving, and non-negative matrix factorization—and show that our SEM implementations significantly advance the state of the art.
high performance distributed computing | 2017
Disa Mhembere; Da Zheng; Carey E. Priebe; Joshua T. Vogelstein; Randal C. Burns
k-means is one of the most influential and utilized machine learning algorithms. Its computation limits the performance and scalability of many statistical analysis and machine learning tasks. We rethink and optimize k-means in terms of modern NUMA architectures to develop a novel parallelization scheme that delays and minimizes synchronization barriers. The k-means NUMA Optimized Routine knor) library has (i) in-memory knori), (ii) distributed memory (knord), and (ii) semi-external memory (\textsf{knors}) modules that radically improve the performance of k-means for varying memory and hardware budgets. knori boosts performance for single machine datasets by an order of magnitude or more. \textsf{knors} improves the scalability of k-means on a memory budget using SSDs. knors scales to billions of points on a single machine, using a fraction of the resources that distributed in-memory systems require. knord retains knoris performance characteristics, while scaling in-memory through distributed computation in the cloud. knor modifies Elkans triangle inequality pruning algorithm such that we utilize it on billion-point datasets without the significant memory overhead of the original algorithm. We demonstrate knor outperforms distributed commercial products like H2O, Turi (formerly Dato, GraphLab) and Sparks MLlib by more than an order of magnitude for datasets of 107 to 109 points.
bioRxiv | 2018
Gregory Kiar; Eric Bridgeford; Vikram Chandrashekhar; Disa Mhembere; Randal C. Burns; William Gray Roncal; Joshua T. Vogelstein
Modern scientific discovery depends on collecting large heterogeneous datasets with many sources of variability, and applying domain-specific pipelines from which one can draw insight or clinical utility. For example, macroscale connectomics studies require complex pipelines to process raw functional or diffusion data and estimate connectomes. Individual studies tend to customize pipelines to their needs, raising concerns about their reproducibility, which add to a longer list of factors that may differ across studies and result in failures to replicate (including sampling, experimental design, and data acquisition protocols). Mitigating these issues requires multi-study datasets and the development of pipelines that can be applied across them. We developed NeuroData’s MRI to Graphs (NDMG) pipeline using several functional and diffusion studies, including the Consortium for Reliability and Reproducability, to estimate connectomes. Without any manual intervention or parameter tuning, NDMG ran on 25 different studies (≈6,000 scans) from 19 sites, with each scan resulting in a biologically plausible connectome (as assessed by multiple quality assurance metrics at each processing stage). For each study, the connectomes from NDMG are more similar within than across individuals, indicating that NDMG is preserving biological variability. Moreover, the connectomes exhibit near perfect consistency for certain connectional properties across every scan, individual, study, site and modality; these include stronger ipsilateral than contralateral connections and stronger homotopic than heterotopic connections. Yet, the magnitude of the differences varied across individuals and studies—much more so when pooling data across sites, even after controlling for study, site, and basic demographic variables (i.e., age, sex, and ethnicity). This indicates that other experimental variables (possibly those not measured or reported) are contributing to this variability, which if not accounted for can limit the value of aggregate datasets, as well as expectations regarding the accuracy of findings and likelihood of replication. We therefore provide a set of principles to guide the development of pipelines capable of pooling data across studies while maintaining biological variability and minimizing measurement error. This open science approach provides us with an opportunity to understand and eventually mitigate spurious results for both past and future studies.The connectivity of the human brain is fundamental to understanding the principles of cognitive function, and the mechanisms by which it can go awry. To that extent, tools for estimating human brain networks are required for single participant, group level, and cross-study analyses. We have developed an open-source, cloud-enabled, turn-key pipeline that operates on (groups of) raw diffusion and structure magnetic resonance imaging data, estimating brain networks (connectomes) across 24 different spatial scales, with quality assurance visualizations at each stage of processing. Running a harmonized analysis on 10 different datasets comprising 2,295 subjects and 2,861 scans reveals that the connectomes across datasets are similar on coarse scales, but quantitatively different on fine scales. Our framework therefore illustrates that while general principles of human brain organization may be preserved across experiments, obtaining reliable p-values and clinical biomarkers from connectomics will require further harmonization efforts.
acm sigplan symposium on principles and practice of parallel programming | 2018
Da Zheng; Disa Mhembere; Joshua T. Vogelstein; Carey E. Priebe; Randal C. Burns
R is one of the most popular programming languages for statistics and machine learning, but it is slow and unable to scale to large datasets. The general approach for having an efficient algorithm in R is to implement it in C or FORTRAN and provide an R wrapper. FlashR accelerates and scales existing R code by parallelizing a large number of matrix functions in the R base package and scaling them beyond memory capacity with solid-state drives (SSDs). FlashR performs memory hierarchy aware execution to speed up parallelized R code by (i) evaluating matrix operations lazily, (ii) performing all operations in a DAG in a single execution and with only one pass over data to increase the ratio of computation to I/O, (iii) performing two levels of matrix partitioning and reordering computation on matrix partitions to reduce data movement in the memory hierarchy. We evaluate FlashR on various machine learning and statistics algorithms on inputs of up to four billion data points. Despite the huge performance gap between SSDs and RAM, FlashR on SSDs closely tracks the performance of FlashR in memory for many algorithms. The R implementations in FlashR outperforms H2O and Spark MLlib by a factor of 3 -- 20.
file and storage technologies | 2015
Da Zheng; Disa Mhembere; Randal C. Burns; Joshua T. Vogelstein; Carey E. Priebe; Alexander S. Szalay
Archive | 2016
Gregory Kiar; Eric Bridgeford; Joshua T. Vogelstein; William Gray Roncal; Randal C. Burns; Disa Mhembere
arXiv: Distributed, Parallel, and Cluster Computing | 2016
Da Zheng; Disa Mhembere; Vince Lyzinski; Joshua T. Vogelstein; Carey E. Priebe; Randal C. Burns
arXiv: Performance | 2018
James Browne; Tyler Tomita; Disa Mhembere; Randal C. Burns; Joshua T. Vogelstein