Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Drew Schmidt is active.

Publication


Featured researches published by Drew Schmidt.


ieee international conference on high performance computing data and analytics | 2012

Tight Coupling of R and Distributed Linear Algebra for High-Level Programming with Big Data

Drew Schmidt; George Ostrouchov; Wei-Chen Chen; Pragneshkumar Patel

We present a new distributed programming extension of the R programming language. By tightly coupling R to the well-known ScaLAPACK and MPI libraries, we are able to achieve highly scalable implementations of common statistical methods, allowing the user to analyze bigger datasets with R than ever before. Early benchmarks show great optimism for the project and its future.


Big Data Research | 2017

Programming with BIG Data in R: Scaling Analytics from One to Thousands of Nodes ☆ ☆☆

Drew Schmidt; Wei Chen Chen; Michael A. Matheson; George Ostrouchov

Abstract We present a tutorial overview showing how one can achieve scalable performance with R. We do so by utilizing several package extensions, including those from the pbdR project. These packages consist of high performance, high-level interfaces to and extensions of MPI, PBLAS, ScaLAPACK, I/O libraries, profiling libraries, and more. While these libraries shine brightest on large distributed platforms, they also work rather well on small clusters and often, surprisingly, even on a laptop with only two cores. Our tutorial begins with recommendations on how to get more performance out of your R code before considering parallel implementations. Because R is a high-level language, a function can have a deep hierarchy of operations. For big data, this can easily lead to inefficiency. Profiling is an important tool to understand the performance of an R code for both serial and parallel improvements. The pbdR packages provide a highly scalable capability for the development of novel distributed data analysis algorithms. This level of scalability is unmatched in other analysis software. Interactive speeds (seconds) are achieved for complex analysis algorithms on data 100 GB and more. This is possible because the interfaces add little overhead to the scalable libraries and their extensions. Furthermore, this is often achieved with little or no change to serial R codes. Our overview includes codes of varying complexity, illustrating reading data in parallel, the process of changing a serial code to a distributed parallel code, and how to engage distributed matrix computation from within R.


Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale | 2016

Introducing a New Client/Server Framework for Big Data Analytics with the R Language

Drew Schmidt; Wei-Chen Chen; George Ostrouchov

Historically, large scale computing and interactivity have been at odds. This is a particularly sore spot for data analytics applications, which are typically interactive in nature. To help address this problem, we introduce a new client/server framework for the R language. This framework allows the R programmer to remotely control anywhere from one to thousands of batch servers running as cooperating instances of R. And all of this is done from the users local R session. Additionally, no specialized software environment is needed; the framework is a series of R packages, available from CRAN. The communication between client and server(s) is handled by the well-known ZeroMQ library. To handle server side computations, we use our established pbdR packages for large scale distributed computing. These packages utilize HPC standards like MPI and ScaLAPACK to handle complex, tightly-coupled computations on large datasets. In this paper, we outline the new client/server architecture components, discuss the pros and cons to this approach, and provide several example workflows that bring interactivity to potentially terabyte size computations.


Archive | 2016

Programming with Big Data – Interface to MPI

Wei-Chen Chen; George Ostrouchov; Drew Schmidt; Pragneshkumar Patel; Hao Yu


Archive | 2017

Parallel Statistical Computing with R: An Illustration on Two Architectures

George Ostrouchov; Wei-Chen Chen; Drew Schmidt


Archive | 2016

Programming with Big Data – Scalable Linear Algebra Packages

Wei-Chen Chen; Drew Schmidt; George Ostrouchov; Pragneshkumar Patel


Archive | 2016

Programming with Big Data – Demonstrations and Examples Using'pbdR' Packages

Drew Schmidt; Wei-Chen Chen; George Ostrouchov; Pragneshkumar Patel


Archive | 2016

Programming with Big Data – Interface to ZeroMQ

Wei-Chen Chen; Drew Schmidt; Christian Heckendorf; George Ostrouchov


Archive | 2014

Programming with Big Data – Interface to Parallel UnidataNetCDF4 Format Data Files

Pragneshkumar Patel; George Ostrouchov; Wei-Chen Chen; Drew Schmidt; David Pierce


Archive | 2014

Programming with Big Data – Demonstrations of pbd Packages

Drew Schmidt; Wei-Chen Chen; George Ostrouchov; Pragneshkumar Patel

Collaboration


Dive into the Drew Schmidt's collaboration.

Top Co-Authors

Avatar

George Ostrouchov

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Wei-Chen Chen

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael A. Matheson

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Wei Chen Chen

Food and Drug Administration

View shared research outputs
Researchain Logo
Decentralizing Knowledge