Is this you? Create Your Porfile

Lauren Milechin

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lauren Milechin is active.

Explore More

Publication

Featured researches published by Lauren Milechin.

asilomar conference on signals, systems and computers | 2015

Sampling operations on big data

Vijay Gadepally; Taylor Herr; Luke B. Johnson; Lauren Milechin; Maja Milosavljevic; Benjamin A. Miller

The 3Vs - Volume, Velocity and Variety - of Big Data continues to be a large challenge for systems and algorithms designed to store, process and disseminate information for discovery and exploration under real-time constraints. Common signal processing operations such as sampling and filtering, which have been used for decades to compress signals are often undefined in data that is characterized by heterogeneity, high dimensionality, and lack of known structure. In this article, we describe and demonstrate an approach to sample large datasets such as social media data. We evaluate the effect of sampling on a common predictive analytic: link prediction. Our results indicate that greatly sampling a dataset can still yield meaningful link prediction results.

ieee high performance extreme computing conference | 2016

Benchmarking SciDB data import on HPC systems

Siddharth Samsi; Laura J. Brattain; David Bestor; Bill Bergeron; Chansup Byun; Vijay Gadepally; Matthew Hubbell; Michael Jones; Anna Klein; Peter Michaleas; Lauren Milechin; Julie Mullen; Andrew Prout; Antonio Rosa; Charles Yee; Jeremy Kepner; Albert Reuther

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming.

ieee high performance extreme computing conference | 2016

Scalability of VM provisioning systems

Michael Jones; Bill Arcand; Bill Bergeron; David Bestor; Chansup Byun; Lauren Milechin; Vijay Gadepally; Matthew Hubbell; Jeremy Kepner; Peter Michaleas; Julie Mullen; Andy Prout; Tony Rosa; Siddharth Samsi; Charles Yee; Albert Reuther

Virtual machines and virtualized hardware have been around for over half a century. The commoditization of the x86 platform and its rapidly growing hardware capabilities have led to recent exponential growth in the use of virtualization both in the enterprise and high performance computing (HPC). The startup time of a virtualized environment is a key performance metric for high performance computing in which the runtime of any individual task is typically much shorter than the lifetime of a virtualized service in an enterprise context. In this paper, a methodology for accurately measuring the startup performance on an HPC system is described. The startup performance overhead of three of the most mature, widely deployed cloud management frameworks (OpenStack, OpenNebula, and Eucalyptus) is measured to determine their suitability for workloads typically seen in an HPC environment. A 10x performance difference is observed between the fastest (Eucalyptus) and the slowest (OpenNebula) framework. This time difference is primarily due to delays in waiting on networking in the cloud-init portion of the startup. The methodology and measurements presented should facilitate the optimization of startup across a variety of virtualization environments.

ieee high performance extreme computing conference | 2016

Enhancing HPC security with a user-based firewall

Andrew Prout; David Bestor; Bill Bergeron; Chansup Byun; Vijay Gadepally; Matthew Hubbell; Michael Houle; Michael Jones; Peter Michaleas; Lauren Milechin; Julie Mullen; Antonio Rosa; Siddharth Samsi; Albert Reuther; Jeremy Kepner

High Performance Computing (HPC) systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network, such as those used in MPI, Lustre, Hadoop, or Accumulo, must provide their own authentication at the application layer. Many methods have been employed to perform this authentication, such as operating system privileged ports, Kerberos, munge, TLS, and PKI certificates. However, support for all of these methods requires the HPC application developer to include support and the user to configure and enable these services. The user-based firewall capability we have prototyped enables a set of rules governing connections across the HPC internal network to be put into place using Linux netfilter. By using an operating system-level capability, the system is not reliant on any developer or user actions to enable security. The rules we have chosen and implemented are crafted to not impact the vast majority of users and be completely invisible to them. Additionally, we have measured the performance impact of this system under various workloads.

ieee high performance extreme computing conference | 2017

MIT SuperCloud portal workspace: Enabling HPC web application deployment

The MIT SuperCloud Portal Workspace enables the secure exposure of web services running on high performance computing (HPC) systems. The portal allows users to run any web application as an HPC job and access it from their workstation while providing authentication, encryption, and access control at the system level to prevent unintended access. This capability permits users to seamlessly utilize existing and emerging tools that present their user interface as a website on an HPC system creating a portal workspace. Performance measurements indicate that the MIT SuperCloud Portal Workspace incurs marginal overhead when compared to a direct connection of the same service.

ieee high performance extreme computing conference | 2017

D4M 3.0: Extended database and language capabilities

Lauren Milechin; Vijay Gadepally; Siddharth Samsi; Jeremy Kepner; Alexander Chen; Dylan Hutchison

The D4M tool was developed to address many of todays data needs. This tool is used by hundreds of researchers to perform complex analytics on unstructured data. Over the past few years, the D4M toolbox has evolved to support connectivity with a variety of new database engines, including SciDB. D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo database. Finally, an implementation using the Julia programming language is also now available. In this article, we describe some of our latest additions to the D4M toolbox and our upcoming D4M 3.0 release. We show through benchmarking and scaling results that we can achieve fast SciDB ingest using the D4M-SciDB connector, that using Graphulo can enable graph algorithms on scales that can be memory limited, and that the Julia implementation of D4M achieves comparable performance or exceeds that of the existing MATLAB® implementation.

bioRxiv | 2017

Detecting Pathogen Exposure During the Non Symptomatic Incubation Period Using Physiological Data

Albert Swiston; Lauren Milechin

Early pathogen exposure detection allows better patient care and faster implementation of public health measures (patient isolation, contact tracing). Existing exposure detection most frequently relies on overt clinical symptoms, namely fever, during the infectious prodromal period. We have developed a robust machine learning based method to better detect asymptomatic states during the incubation period using subtle, sub-clinical physiological markers. Starting with high-resolution physiological waveform data from non-human primate studies of viral (Ebola, Marburg, Lassa, and Nipah viruses) and bacterial (Y. pestis) exposure, we processed the data to reduce short-term variability and normalize diurnal variations, then provided these to a supervised random forest classification algorithm and post-classifier declaration logic step to reduce false alarms. In most subjects detection is achieved well before the onset of fever; subject cross-validation across exposure studies (varying viruses, exposure routes, animal species, and target dose) lead to 51h mean early detection (at 0.93 area under the receiver-operating characteristic curve [AUCROC]). Evaluating the algorithm against entirely independent datasets for Lassa, Nipah, and Y. pestis exposures un-used in algorithm training and development yields a mean 51h early warning time (at AUCROC=0.95). We discuss which physiological indicators are most informative for early detection and options for extending this capability to limited datasets such as those available from wearable, non-invasive, ECG-based sensors.

arXiv: Learning | 2018

Sparse Deep Neural Network Exact Solutions.

Jeremy Kepner; Vijay Gadepally; Hayden Jananthan; Lauren Milechin; Siddharth Samsi

arXiv: Distributed, Parallel, and Cluster Computing | 2018

Interactive Supercomputing on 40, 000 Cores for Machine Learning and Data Analysis.

Albert Reuther; Jeremy Kepner; Chansup Byun; Siddharth Samsi; David Bestor; Bill Bergeron; Vijay Gadepally; Michael Houle; Matthew Hubbell; Michael Jones; Anna Klein; Lauren Milechin; Julia S. Mullen; Andrew Prout; Antonio Rosa; Charles Yee; Peter Michaleas

international parallel and distributed processing symposium | 2018

Design, Generation, and Validation of Extreme Scale Power-Law Graphs

Jeremy Kepner; Siddharth Samsi; David Bestor; Bill Bergeron; Tim Davis; Vijay Gadepally; Michael Houle; Matthew Hubbell; Hayden Jananthan; Michael Jones; Anna Klein; Peter Michaleas; Roger A. Pearce; Lauren Milechin; Julie Mullen; Andrew Prout; Antonio Rosa; Geoffrey Sanders; Charles Yee; Albert Reuther

Explore More