Is this you? Create Your Porfile

Khushbu Agarwal

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Khushbu Agarwal is active.

Explore More

Publication

Featured researches published by Khushbu Agarwal.

Bioinformatics | 2010

Machine learning based prediction for peptide drift times in ion mobility spectrometry

Anuj R. Shah; Khushbu Agarwal; Erin S. Baker; Mudita Singhal; Anoop Mayampurath; Yehia M. Ibrahim; Lars J. Kangas; Matthew E. Monroe; Rui Zhao; Mikhail E. Belov; Gordon A. Anderson; Richard D. Smith

MOTIVATION Ion mobility spectrometry (IMS) has gained significant traction over the past few years for rapid, high-resolution separations of analytes based upon gas-phase ion structure, with significant potential impacts in the field of proteomic analysis. IMS coupled with mass spectrometry (MS) affords multiple improvements over traditional proteomics techniques, such as in the elucidation of secondary structure information, identification of post-translational modifications, as well as higher identification rates with reduced experiment times. The high throughput nature of this technique benefits from accurate calculation of cross sections, mobilities and associated drift times of peptides, thereby enhancing downstream data analysis. Here, we present a model that uses physicochemical properties of peptides to accurately predict a peptides drift time directly from its amino acid sequence. This model is used in conjunction with two mathematical techniques, a partial least squares regression and a support vector regression setting. RESULTS When tested on an experimentally created high confidence database of 8675 peptide sequences with measured drift times, both techniques statistically significantly outperform the intrinsic size parameters-based calculations, the currently held practice in the field, on all charge states (+2, +3 and +4). AVAILABILITY The software executable, imPredict, is available for download from http:/omics.pnl.gov/software/imPredict.php CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

modeling, analysis, and simulation on computer and telecommunication systems | 2014

Synchronization Algorithms for Co-simulation of Power Grid and Communication Networks

Selim Ciraci; Jeffrey A. Daily; Khushbu Agarwal; Jason C. Fuller; Laurentiu D. Marinovici; Andrew R. Fisher

The ongoing modernization of power grids consists of integrating them with communication networks in order to achieve robust and resilient control of grid operations. To understand the operation of the new smart grid, one approach is to use simulation software. Unfortunately, current power grid simulators at best utilize inadequate approximations to simulate communication networks, if at all. Cooperative simulation of specialized power grid and communication network simulators promises to more accurately reproduce the interactions of real smart grid deployments. However, co-simulation is a challenging problem. A co-simulation must manage the exchange of information, including the synchronization of simulator clocks, between all simulators while maintaining adequate computational performance. This paper describes two new conservative algorithms for reducing the overhead of time synchronization, namely Active Set Conservative and Reactive Conservative. We provide a detailed analysis of their performance characteristics with respect to the current state of the art including both conservative and optimistic synchronization algorithms. In addition, we provide guidelines for selecting the appropriate synchronization algorithm based on the requirements of the co-simulation. The newly proposed algorithms are shown to achieve as much as 14% and 63% improvement in performance, respectively, over the existing conservative algorithm.

Computing in Science and Engineering | 2014

Reveal: An Extensible Reduced-Order Model Builder for Simulation and Modeling

Khushbu Agarwal; Poorva Sharma; Jinliang Ma; Chaomei Lo; Ian Gorton; Yan Liu

Many science domains need to build computationally efficient and accurate representations of high fidelity, computationally expensive simulations known as reduced-order models (ROMs). This article presents the design and implementation of the Reveal toolset, a ROM builder that generates ROMs based on science- and engineering-domain-specific simulations executed on high-performance computing (HPC) platforms. The toolset encompasses a range of sampling and regression methods for ROM generation, automatically quantifies ROM accuracy, and supports an iterative approach to improve ROM accuracy. Reveal is designed to be extensible for any simulator that has published input and output formats. It also defines programmatic interfaces to include new sampling and regression techniques so users can mix and match mathematical techniques best suited to their model characteristics. The article describes the architecture of Reveal and demonstrates its use with a computational fluid dynamics model used in carbon capture.

international conference on cluster computing | 2015

Large Scale Frequent Pattern Mining Using MPI One-Sided Model

Abhinav Vishnu; Khushbu Agarwal

In this paper, we propose a work-stealing runtime - Library for Work Stealing (LibWS) - using MPI one-sided model for designing scalable FP-Growth - defacto frequent pattern mining algorithm - on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art θ(p) to θ(f + p/f), for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. An experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (91% efficiency for Power-law and 93% for Poisson). The proposed distributed FPTree merging algorithm provides 38x communication speedup on 4096 cores.

international conference on cluster computing | 2011

Implementing High Performance Remote Method Invocation in CCA

Jian Yin; Khushbu Agarwal; Manoj Kumar Krishnan; Daniel G. Chavarría-Miranda; Ian Gorton; Thomas Epperly

We report our effort in engineering a high performance remote method invocation (RMI) mechanism for the Common Component Architecture (CCA). This mechanism provides a highly efficient and easy-to-use mechanism for distributed computing in CCA, enabling CCA applications to effectively leverage parallel systems to accelerate computations. This work is built on the previous work of Babel RMI. Babel is a high performance language interoperability tool that is used in CCA for scientific application writers to share, reuse, and compose applications from software components written in different programming languages. Babel provides a transparent and flexible RMI framework for distributed computing. However, the existing Babel RMI implementation is built on top of TCP and does not provide the level of performance required to distribute fine-grained tasks. We observed that the main reason the TCP based RMI does not perform well is because it does not utilize the high performance interconnect hardware on a cluster efficiently. We have implemented a high performance RMI protocol, HPCRMI. HPCRMI achieves low latency by building on top of a low-level portable communication library, Aggregated Remote Message Copy Interface (ARMCI), and minimizing communication for each RMI call. Our design allows a RMI operation to be completed by only two RDMA operations. We also aggressively optimize our system to reduce copying. In this paper, we discuss the design and our experimental evaluation of this protocol. Our experimental results show that our protocol can improve RMI performance by an order of magnitude.

many task computing on grids and supercomputers | 2011

Design and implementation of "many parallel task" hybrid subsurface model

Khushbu Agarwal; Jared M. Chase; Karen L. Schuchardt; Timothy D. Schiebe; Bruce J. Palmer; Todd O. Elsethagen

Continuum scale models have been used to study subsurface flow, transport, and reactions for many years. Recently, pore scale models, which operate at scales of individual soil grains, have been developed to more accurately model pore scale phenomena, such as precipitation, that may not be well represented at the continuum scale. However, particle-based models become prohibitively expensive for modeling realistic domains. Instead, we are developing a hybrid model that simulates the full domain at continuum scale and applies the pore model only to areas of high reactivity. The hybrid model uses a dimension reduction approach to formulate the mathematical exchange of information across scales. Since the location, size, and number of pore regions in the model varies, an adaptive Pore Generator is being implemented to define pore regions at each iteration. A fourth code will provide data transformation from the pore scale back to the continuum scale. These components are coupled into a single hybrid model using the Swift workflow system. Our hybrid model workflow simulates a kinetic controlled mixing reaction in which multiple pore-scale simulations occur for every continuum scale time step. Each pore-scale simulation is itself parallel, thus exhibiting multi-level parallelism. Our workflow manages these multiple parallel tasks simultaneously, with the number of tasks changing across iterations. It also supports dynamic allocation of job resources and visualization processing at each iteration. We discuss the design, implementation and challenges associated with building a scalable, Many Parallel Task, hybrid model to run efficiently on thousands to tens of thousands of processors.

web search and data mining | 2018

Percolator: Scalable Pattern Discovery in Dynamic Graphs

Sutanay Choudhury; Sumit Purohit; Peng Lin; Yinghui Wu; Lawrence B. Holder; Khushbu Agarwal

We demonstrate \perco, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of \perco is to dynamically decide and verify a small fraction of patterns and their instances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a( the feasibility of incremental pattern mining by walking through each component of \perco, b) the efficiency and scalability of \perco over the sheer size of real-world dynamic graphs, and c) how the user-friendly \gui of \perco interacts with users to support keyword-based queries that detect, browse and inspect trending patterns. We demonstrate how \perco effectively supports event and trend analysis in social media streams and research publication, respectively.

international parallel and distributed processing symposium | 2017

High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS

Vito Giovanni Castellana; Marco Minutoli; Shreyansh Bhatt; Khushbu Agarwal; Arthur Bleeker; John Feo; Daniel G. Chavarría-Miranda; David J. Haglin

Graphs represent an increasingly popular data model for data-analytics, since they can naturally represent relationships and interactions between entities. Relational databases and their pure table-based data model are not well suitable to store and process sparse data. Consequently, graph databases have gained interest in the last few years and the Resource Description Framework (RDF) became the standard data model for graph data. Nevertheless, while RDF is well suited to analyze the relationships between the entities, it is not efficient in representing their attributes and properties. In this work we propose the adoption of a new hybrid data model, based on attributed graphs, that aims at overcoming the limitations of the pure relational and graph data models. We present how we have re-designed the GEMS data-analytics framework to fully take advantage of the proposed hybrid data model. To improve analysts productivity, in addition to a C++ API for applications development, we adopt GraQL as input query language. We validate our approach implementing a set of queries on net-flow data and we compare our framework performance against Neo4j. Experimental results show significant performance improvement over Neo4j, up to several orders of magnitude when increasing the size of the input data.

ieee/acm international symposium cluster, cloud and grid computing | 2013

Scalable PGAS Metadata Management on Extreme Scale Systems

Daniel G. Chavarría-Miranda; Khushbu Agarwal; Tjerk P. Straatsma

Programming models intended to run on exascale systems have a number of challenges to overcome, specially the sheer size of the system as measured by the number of concurrent software entities created and managed by the underlying runtime. It is clear from the size of these systems that any state maintained by the programming model has to be strictly sub-linear in size, in order not to overwhelm memory usage with pure overhead. A principal feature of Partitioned Global Address Space (PGAS) models is providing easy access to global-view distributed data structures. In order to provide efficient access to these distributed data structures, PGAS models must keep track of metadata such as where array sections are located with respect to processes/threads running on the HPC system. As PGAS models and applications become ubiquitous on very large trans-pet scale systems, a key component to their performance and scalability will be efficient and judicious use of memory for model overhead (metadata) compared to application data. We present an evaluation of several strategies to manage PGAS metadata that exhibit different space/time tradeoffs. We use two real-world PGAS applications to capture metadata usage patterns and gain insight into their communication behavior.

DEVS '14 Proceedings of the Symposium on Theory of Modeling & Simulation - DEVS Integrative | 2014