Saba Sehrish
Fermilab
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saba Sehrish.
Astronomy and Computing | 2015
Joe Zuntz; Marc Paterno; Elise Jennings; Douglas H. Rudd; A. Manzotti; Scott Dodelson; Sarah Bridle; Saba Sehrish; James Kowalkowski
Cosmological parameter estimation is entering a new era. Large collaborations need to coordinate high-stakes analyses using multiple methods; furthermore such analyses have grown in complexity due to sophisticated models of cosmology and systematic uncertainties. In this paper we argue that modularity is the key to addressing these challenges: calculations should be broken up into interchangeable modular units with inputs and outputs clearly defined. We present a new framework for cosmological parameter estimation, CosmoSIS, designed to connect together, share, and advance development of inference tools across the community. We describe the modules already available in CosmoSIS, including CAMB, Planck, cosmic shear calculations, and a suite of samplers. We illustrate it using demonstration code that you can run out-of-the-box with the installer available at this http URL
New Astronomy | 2016
Salman Habib; Adrian Pope; Hal Finkel; Nicholas Frontiere; Katrin Heitmann; David Daniel; Patricia K. Fasel; Vitali A. Morozov; George Zagaris; Tom Peterka; Venkatram Vishwanath; Zarija Lukić; Saba Sehrish; Wei-keng Liao
Abstract Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the ‘Dark Universe’, dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC’s design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
IEEE Transactions on Parallel and Distributed Systems | 2013
Saba Sehrish; Grant Mackey; Pengju Shang; Jun Wang; John M. Bent
Current High Performance Computing (HPC) applications have seen an explosive growth in the size of data in recent years. Many application scientists have initiated efforts to integrate data-intensive computing into computational-intensive HPC facilities, particularly for data analytics. We have observed several scientific applications which must migrate their data from an HPC storage system to a data-intensive one for analytics. There is a gap between the data semantics of HPC storage and data-intensive system, hence, once migrated, the data must be further refined and reorganized. This reorganization must be performed before existing data-intensive tools such as MapReduce can be used to analyze data. This reorganization requires at least two complete scans through the data set and then at least one MapReduce program to prepare the data before analyzing it. Running multiple MapReduce phases causes significant overhead for the application, in the form of excessive I/O operations. That is for every MapReduce phase, a distributed read and write operation on the file system must be performed. Our contribution is to develop a MapReduce-based framework for HPC analytics to eliminate the multiple scans and also reduce the number of data preprocessing MapReduce programs. We also implement a data-centric scheduler to further improve the performance of HPC analytics MapReduce programs by maintaining the data locality. We have added additional expressiveness to the MapReduce language to allow application scientists to specify the logical semantics of their data such that 1) the data can be analyzed without running multiple data preprocessing MapReduce programs, and 2) the data can be simultaneously reorganized as it is migrated to the data-intensive file system. Using our augmented Map-Reduce system, MapReduce with Access Patterns (MRAP), we have demonstrated up to 33 percent throughput improvement in one real application, and up to 70 percent in an I/O kernel of another application. Our results for scheduling show up to 49 percent improvement for an I/O kernel of a prevalent HPC analysis application.
grid computing environments | 2014
Ryan Chard; Saba Sehrish; Alex Rodriguez; Ravi K. Madduri; Thomas D. Uram; Marc Paterno; Katrin Heitmann; Shreyas Cholia; Jim Kowalkowski; Salman Habib
Accessing and analyzing data from cosmological simulations is a major challenge due to the prohibitive size of cosmological datasets and the diversity of the associated large-scale analysis tasks. Analysis of the simulated models requires direct access to the datasets, considerable compute infrastructure, and storage capacity for the results. Resource limitations can become serious obstacles to performing research on the most advanced cosmological simulations. The Portal for Data Analysis services for Cosmological Simulations (PDACS) is a web-based workflow service and scientific gateway for cosmology. The PDACS platform provides access to shared repositories for datasets, analytical tools, cosmological workflows, and the infrastructure required to perform a wide variety of analyses. PDACS is a repurposed implementation of the Galaxy workflow engine and supports a rich collection of cosmology-specific datatypes and tools. The platform leverages high-performance computing infrastructure at the National Energy Research Scientific Computing Center (NERSC) and Argonne National Laboratory (ANL), enabling researchers to deploy computationally intensive workflows. In this paper we present PDACS and discuss the process and challenges of developing a research platform for cosmological research.
Proceedings of the 20th European MPI Users' Group Meeting on | 2013
Saba Sehrish; Seung Woo Son; Wei-keng Liao; Alok N. Choudhary; Karen L. Schuchardt
In this paper, we propose a multi-buffer pipelining approach to improve collective I/O performance by overlapping the dominant request aggregation phases with the I/O phase in the two-phase I/O implementation. Our pipelining method first divides the collective buffer into a group of small size buffers for an individual collective I/O call and then pipelines the asynchronous communication to exchange the I/O requests with the I/O requests sent to the file system. Our performance evaluation of a representative I/O benchmark and a production application shows 20% improvement in the I/O time, given theoretical upper bound of 50% when both phases completely overlap.
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface | 2011
Chen Jin; Saba Sehrish; Wei-keng Liao; Alok N. Choudhary; Karen L. Schuchardt
In collective I/O, MPI processes exchange requests so that the rearranged requests can result in the shortest file system access time. Scheduling the exchange sequence determines the response time of participating processes. Existing implementations that simply follow the increasing order of file offsets do not necessary produce the best performance. To minimize the average response time, we propose three scheduling algorithms that consider the number of processes per file stripe and the number of accesses per process. Our experimental results demonstrate improvements of up to 50% in the average response time using two synthetic benchmarks and a high-resolution climate application.
arXiv: Distributed, Parallel, and Cluster Computing | 2017
Oliver Gutsche; Jim Pivarski; Jim Kowalkowski; Nhan Tran; A. Svyatkovskiy; Matteo Cremonesi; P. Elmer; Bo Jayatilaka; Saba Sehrish; Cristina Mantilla Suarez
Experimental Particle Physics has been at the forefront of analyzing the worlds largest datasets for decades. The HEP community was the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems collectively called Big Data technologies have emerged to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and promise a fresh look at analysis of very large datasets and could potentially reduce the time-to-physics with increased interactivity. In this talk, we present an active LHC Run 2 analysis, searching for dark matter with the CMS detector, as a testbed for Big Data technologies. We directly compare the traditional NTuple-based analysis with an equivalent analysis using Apache Spark on the Hadoop ecosystem and beyond. In both cases, we start the analysis with the official experiment data formats and produce publication physics plots. We will discuss advantages and disadvantages of each approach and give an outlook on further studies needed.
The Journal of Supercomputing | 2017
Seung Woo Son; Saba Sehrish; Wei-keng Liao; Ron A. Oldfield; Alok N. Choudhary
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.
Computing in Science and Engineering | 2015
Ravi K. Madduri; Alex Rodriguez; Thomas D. Uram; Katrin Heitmann; Tanu Malik; Saba Sehrish; Ryan Chard; Shreyas Cholia; Marc Paterno; Jim Kowalkowski; Salman Habib
PDACS (Portal for Data Analysis Services for Cosmological Simulations) is a Web-based analysis portal that provides access to large simulations and large-scale parallel analysis tools to the research community. It provides opportunities to access, transfer, manipulate, search, and record simulation data, as well as to contribute applications and carry out (possibly complex) computational analyses of the data. PDACS also enables wrapping of analysis tools written in a large number of languages within its workflow system, providing a powerful way to carry out multilevel/multistep analyses. The system allows for cross-layer provenance tracking, implementing a transparent method for sharing workflow specifications, as well as a convenient mechanism for checking reproducibility of results generated by the workflows. Users are able to submit their own tools to the system and to share tools with the rest of the community.
international conference on cluster computing | 2013
Seung Woo Son; Saba Sehrish; Wei-keng Liao; Ron A. Oldfield; Alok N. Choudhary
As the number of compute cores on modern parallel machines increases to more than hundreds of thousands, scalable and consistent I/O performance is becoming hard to obtain due to fluctuating file system performance. This fluctuation is often caused by rebuilding RAID disk from hardware failures or concurrent jobs competing for I/O. We present a mechanism that stripes across a dynamically-selected subset of I/O servers with the lightest workload to achieve the best I/O bandwidth available from the system. We implement this mechanism into an I/O software layer that enables memory-to-file data layout transformation and allows transparent file partitioning. File partitioning is a technique that divides data among a set of files and manages file access, making data appear as a single file to users. Experimental results on NERSCs Hopper indicate that our approach effectively isolates I/O variation on shared systems and improves overall I/O performance significantly.