Is this you? Create Your Porfile

Sumit Purohit

Pacific Northwest National Laboratory

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sumit Purohit is active.

Explore More

Publication

Featured researches published by Sumit Purohit.

Proceedings of the Middleware 2011 Industry Track Workshop on | 2011

Scalable real time data management for smart grid

Jian Yin; Anand V. Kulkarni; Sumit Purohit; Ian Gorton; Bora A. Akyol

This paper presents GridMW, a scalable and reliable data middleware layer for smart grids. Smart grids promise to improve the efficiency of power grid systems and reduce green house emissions through incorporating power generation from renewable sources and shaping demands to match the supply. As a result, power grid will become much more dynamic and require constant adjustments, which requires analysis and decision making applications to improve the efficiency and reliability of smart grid systems. However, these applications rely on large amounts of data gathered from power generation, transmission, and consumption. To this end, millions of sensors, including phasor measurement units (PMU) and smart meters, are being deployed across the smart grid system. Existing data middleware does not have the capability to collect, store, retrieve, and deliver the enormous amount of data from these sensors to analysis and control applications. Most existing data middleware approaches utilize general software systems for flexibility so that the solutions can provide general functionality for a range of applications. However, overheads incurred by generalized APIs cause high latencies and unpredictability in performance, which in turn prevents achieving near real time latencies and high throughput. In our work, by tailoring the system specifically to smart grids, we are able to eliminate much of these overheads while still keeping the implementation effort reasonable. This is achieved by using a log structure inspired architecture to directly access the block device layer, eliminating the indirection incurred by high level file system interfaces. Preliminary results show our system can significantly improve performance compared to traditional systems.

Computing in Science and Engineering | 2012

Velo: A Knowledge-Management Framework for Modeling and Simulation

Ian Gorton; Chandrika Sivaramakrishnan; Gary D. Black; Signe K. White; Sumit Purohit; Carina S. Lansing; Michael C. Madison; Karen L. Schuchardt; Yan Liu

Velo is a reusable, domain-independent knowledge-management infrastructure for modeling and simulation. Velo leverages, integrates, and extends Web-based open source collaborative and data-management technologies to create a scalable and flexible core platform tailored to specific scientific domains. As the examples here describe, Velo has been used in both the carbon sequestration and climate modeling domains.

Environmental Modelling and Software | 2014

A high-performance workflow system for subsurface simulation

Vicky L. Freedman; Xingyuan Chen; Stefan Finsterle; Mark D. Freshley; Ian Gorton; Luke J. Gosink; Elizabeth H. Keating; Carina S. Lansing; William A.M. Moeglein; Christopher J. Murray; George Shu Heng Pau; Ellen A. Porter; Sumit Purohit; Mark L. Rockhold; Karen L. Schuchardt; Chandrika Sivaramakrishnan; Velimir Vessilinov; Scott R. Waichler

The U.S. Department of Energy (DOE) recently invested in developing a numerical modeling toolset called ASCEM (Advanced Simulation Capability for Environmental Management) to support modeling analyses at legacy waste sites. This investment includes the development of an open-source user environment called Akuna that manages subsurface simulation workflows. Core toolsets accessible through the Akuna user interface include model setup, grid generation, sensitivity analysis, model calibration, and uncertainty quantification. Additional toolsets are used to manage simulation data and visualize results. This new workflow technology is demonstrated by streamlining model setup, calibration, and uncertainty analysis using high performance computation for the BC Cribs Site, a legacy waste area at the Hanford Site in Washington State. For technetium-99 transport, the uncertainty assessment for potential remedial actions (e.g., surface infiltration covers) demonstrates that using multiple realizations of the geologic conceptual model results in greater variation in concentration predictions than when a single model is used. Akuna provides integrated toolset needed for subsurface modeling workflow.Akuna streamlines process of executing multiple simulations in HPC environment.Akuna provides visualization tools for spatial and temporal data.Example application demonstrates risk with remediation impacting infiltration rates.

computational science and engineering | 2011

Velo: riding the knowledge management wave for simulation and modeling

Ian Gorton; Chandrika Sivaramakrishnan; Gary D. Black; Signe K. White; Sumit Purohit; Michael C. Madison; Karen L. Schuchardt

Modern scientific enterprises are inherently knowledge-intensive. In general, scientific studies in domains such as geosciences, climate, and biology require the acquisition and manipulation of large amounts of experimental and field data in order to create inputs for large-scale computational simulations. The results of these simulations must then be analyzed, leading to refinements of inputs and models and additional simulations. Further, these results must be managed and archived to provide justifications for regulatory decisions and publications that are based on these models. In this paper we introduce our Velo framework that is designed as a reusable, domain independent knowledge management infrastructure for modeling and simulation. Velo leverages, integrates, and extends open source collaborative and content management technologies to create a scalable and flexible core platform that can be tailored to specific scientific domains. We describe the architecture of Velo for managing and associating the various types of data that are used and created in modeling and simulation projects, as well as the framework for integrating domain-specific tools. To demonstrate a realization of Velo, we describe the Geologic Sequestration Software Suite (GS3) that has been developed to support geologic sequestration modeling. This provides a concrete example of the inherent extensibility and utility of our approach.

hawaii international conference on system sciences | 2013

GridOPTICS(TM) A Novel Software Framework for Integrating Power Grid Data Storage, Management and Analysis

Ian Gorton; Jian Yin; Bora A. Akyol; Selim Ciraci; Terence Critchlow; Yan Liu; Tara D. Gibson; Sumit Purohit; Poorva Sharma; Maria Vlachopoulou

This paper describes the architecture and design of GridOPTICSTM, a novel software framework for integrating a collection of software tools developed by NPNNLs Future Power Grid Initiative (FPGI) into a coherent, powerful operations and planning tool for the power grid of the future. GridOPTICSTM enables plug-and-play of various analysis, modeling and visualization software tools to improve the efficiency and reliability of power grid. To bridge the data access for different control purposes, GridOPTICSTM provides a scalable, lightweight event processing layer that hides the complexity of data collection, storage, delivery and management. A significant challenge is the requirement to access large amount of data in real time. We address this challenge though a scalable system architecture that balances system performance and ease of integration. The initial prototype of GridOPTICSTM was demonstrated with several use cases from PNNLs FPGI and show that our system can provide real time data access to a diverse set of applications with easy to use APIs.

ieee international conference semantic computing | 2016

Effective Tooling for Linked Data Publishing in Scientific Research

Sumit Purohit; William P. Smith; Alan R. Chappell; Patrick West; Benno Lee; Eric G. Stephan; Peter Fox

Challenges that make it difficult to find, share, and combine published data, such as data heterogeneity and resource discovery, have led to increased adoption of semantic data standards and data publishing technologies. To make data more accessible, interconnected and discoverable, some domains are being encouraged to publish their data as Linked Data. Consequently, this trend greatly increases the amount of data that semantic web tools are required to process, store, and interconnect. In attempting to process and manipulate large data sets, tools -- ranging from simple text editors to modern triplestores -- eventually breakdown upon reaching undefined thresholds. This paper shares our experiences in curating metadata, primarily to illustrate the challenges, and resulting limitations that data publishers and consumers have in the current technological environment. This paper also provides a Linked Data based solution to the research problem of resource discovery, and offers a systematic approach that the data publishers can take to select suitable tools to meet their data publishing needs. We present a real-world use case, the Resource Discovery for Extreme Scale Collaboration (RDESC), which features a scientific dataset(maximum size of 1.4 billion triples) used to evaluate a toolbox for data publishing in climate research. This paper also introduces a semantic data publishing software suite developed for the RDESC project.

annual acis international conference on computer and information science | 2015

Enhancing the impact of science data toward data discovery and reuse

Alan R. Chappell; Jesse Weaver; Sumit Purohit; William P. Smith; Karen L. Schuchardt; Patrick West; Benno Lee; Peter Fox

The a mount of data produced in support of scientific research continues to grow rapidly. Despite the accumulation and demand for scientific data, relatively little data are actually made available for the broader scientific community. We surmise that one root of this problem is the perceived difficulty of electronically publishing scientific data and associated metadata in a way that makes it discoverable. We propose exploiting Semantic Web technologies and best practices to make metadata both discoverable and easy to publish. We share experiences in curating metadata to illustrate the cumbersome nature of data reuse in the current research environment. We also make recommendations with a real-world example of how data publishers can provide their metadata by adding limited additional markup to HTML pages on the Web. With little additional effort from data publishers, the difficulty of data discovery, access, and sharing can be greatly reduced and the impact of research data greatly enhanced.

web search and data mining | 2018

Percolator: Scalable Pattern Discovery in Dynamic Graphs

Sutanay Choudhury; Sumit Purohit; Peng Lin; Yinghui Wu; Lawrence B. Holder; Khushbu Agarwal

We demonstrate \perco, a distributed system for graph pattern discovery in dynamic graphs. In contrast to conventional mining systems, Percolator advocates efficient pattern mining schemes that (1) support pattern detection with keywords; (2) integrate incremental and parallel pattern mining; and (3) support analytical queries such as trend analysis. The core idea of \perco is to dynamically decide and verify a small fraction of patterns and their instances that must be inspected in response to buffered updates in dynamic graphs, with a total mining cost independent of graph size. We demonstrate a( the feasibility of incremental pattern mining by walking through each component of \perco, b) the efficiency and scalability of \perco over the sheer size of real-world dynamic graphs, and c) how the user-friendly \gui of \perco interacts with users to support keyword-based queries that detect, browse and inspect trending patterns. We demonstrate how \perco effectively supports event and trend analysis in social media streams and research publication, respectively.

conference on information and knowledge management | 2017

When Labels Fall Short: Property Graph Simulation via Blending of Network Structure and Vertex Attributes

Arun V. Sathanur; Sutanay Choudhury; Cliff Joslyn; Sumit Purohit

Property graphs can be used to represent heterogeneous networks with labeled (attributed) vertices and edges. Given a property graph, simulating another graph with same or greater size with the same statistical properties with respect to the labels and connectivity is critical for privacy preservation and benchmarking purposes. In this work we tackle the problem of capturing the statistical dependence of the edge connectivity on the vertex labels and using the same distribution to regenerate property graphs of the same or expanded size in a scalable manner. However, accurate simulation becomes a challenge when the attributes do not completely explain the network structure. We propose the Property Graph Model (PGM) approach that uses a label augmentation strategy to mitigate the problem and preserve the vertex label and the edge connectivity distributions as well as their correlation, while also replicating the degree distribution. Our proposed algorithm is scalable with a linear complexity in the number of edges in the target graph. We illustrate the efficacy of the PGM approach in regenerating and expanding the datasets by leveraging two distinct illustrations. Our open-source implementation is available on GitHub.

ieee international conference semantic computing | 2016

User-Centric Approach for Benchmark RDF Data Generator in Big Data Performance Analysis

Sumit Purohit; Patrick R. Paulson; Luke R. Rodriguez

Changes in data generation sources such as social networks, mobile devices, and process automation, along with an increase in the number of instruments generating observational data have pushed beyond the boundaries of current day data analysis systems. New algorithms, specialized hardware systems, and computing paradigms are designed to solve problems exhibited by large datasets, but at the same time there is a dearth of flexible and easy to use tools to assess the effectiveness of these proposed solutions. Benchmarking tools are required to compare the performance and the cost associated with any Big Data system. This research focuses on a user-centric approach of building such tools and proposes a flexible, extensible, and easy to use framework to support performance analysis of Big Data systems. Finally, case studies from two different domains are presented to validate the framework.

Explore More