Alok Gautam Kumbhare
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alok Gautam Kumbhare.
Computing in Science and Engineering | 2013
Yogesh Simmhan; Saima Aman; Alok Gautam Kumbhare; Rongyang Liu; Sam Stevens; Qunzhi Zhou; Viktor K. Prasanna
This article focuses on a scalable software platform for the Smart Grid cyber-physical system using cloud technologies. Dynamic Demand Response (D2R) is a challenge-application to perform intelligent demand-side management and relieve peak load in Smart Power Grids. The platform offers an adaptive information integration pipeline for ingesting dynamic data; a secure repository for researchers to share knowledge; scalable machine-learning models trained over massive datasets for agile demand forecasting; and a portal for visualizing consumption patterns, and validated at the University of Southern Californias campus microgrid. The article examines the role of clouds and their tradeoffs for use in the Smart Grid Cyber-Physical Sagileystem.
international conference on cloud computing | 2011
Yogesh Simmhan; Alok Gautam Kumbhare; Baohua Cao; Viktor K. Prasanna
Power utilities globally are increasingly upgrading to Smart Grids that use bi-directional communication with the consumer to enable an information-driven approach to distributed energy management. Clouds offer features well suited for Smart Grid software platforms and applications, such as elastic resources and shared services. However, the security and privacy concerns inherent in an information-rich Smart Grid environment are further exacerbated by their deployment on Clouds. Here, we present an analysis of security and privacy issues in a Smart Grids software architecture operating on different Cloud environments, in the form of a taxonomy. We use the Los Angeles Smart Grid Project that is underway in the largest U.S. municipal utility to drive this analysis that will benefit both Cloud practitioners targeting Smart Grid applications, and Cloud researchers investigating security and privacy.
european conference on parallel processing | 2014
Yogesh Simmhan; Alok Gautam Kumbhare; Charith Wickramaarachchi; Soonil Nagarkar; Santosh Ravi; Cauligi S. Raghavendra; Viktor K. Prasanna
Vertex centric models for large scale graph processing are gaining traction due to their simple distributed programming abstraction. However, pure vertex centric algorithms under-perform due to large communication overheads and slow iterative convergence. We introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters, offering the added natural flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model and empirically analyze them for several real world graphs, demonstrating orders of magnitude improvements, in some cases, compared to Apache Giraph’s vertex centric framework.
international conference on cloud computing | 2012
Alok Gautam Kumbhare; Yogesh Simmhan; Viktor K. Prasanna
Cloud storage has become immensely popular for maintaining synchronized copies of files and for sharing documents with collaborators. However, there is heightened concern about the security and privacy of Cloud-hosted data due to the shared infrastructure model and an implicit trust in the service providers. Emerging needs of secure data storage and sharing for domains like Smart Power Grids, which deal with sensitive consumer data, require the persistence and availability of Cloud storage but with client-controlled security and encryption, low key management overhead, and minimal performance costs. Cryptonite is a secure Cloud storage repository that addresses these requirements using a Strongbox model for shared key management. We describe the Cryptonite service and desktop client, discuss performance optimizations, and provide an empirical analysis of the improvements. Our experiments shows that Cryptonite clients achieve a 40% improvement in file upload bandwidth over plaintext storage using the Azure Storage Client API despite the added security benefits, while our file download performance is 5 times faster than the baseline for files greater than 100MB.
ieee international conference on cloud computing technology and science | 2015
Alok Gautam Kumbhare; Yogesh Simmhan; Marc Frîncu; Viktor K. Prasanna
The need for low latency analysis over high-velocity data streams motivates the need for distributed continuous dataflow systems. Contemporary stream processing systems use simple techniques to scale on elastic cloud resources to handle variable data rates. However, application QoS is also impacted by variability in resource performance exhibited by clouds and hence necessitates autonomic methods of provisioning elastic resources to support such applications on cloud infrastructure. We develop the concept of “dynamic dataflows” which utilize alternate tasks as additional control over the dataflows cost and QoS. Further, we formalize an optimization problem to represent deployment and runtime resource provisioning that allows us to balance the applications QoS, value, and the resource cost. We propose two greedy heuristics, centralized and sharded, based on the variable-sized bin packing algorithm and compare against a Genetic Algorithm (GA) based heuristic that gives a near-optimal solution. A large-scale simulation study, using the linear road benchmark and VM performance traces from the AWS public cloud, shows that while GA-based heuristic provides a better quality schedule, the greedy heuristics are more practical, and can intelligently utilize cloud elasticity to mitigate the effect of variability, both in input data rates and cloud resource performance, to meet the QoS of fast data applications.
cluster computing and the grid | 2014
Alok Gautam Kumbhare; Yogesh Simmhan; Viktor K. Prasanna
Scalable stream processing and continuous dataflow systems are gaining traction with the rise of big data due to the need for processing high velocity data in near real time. Unlike batch processing systems such as MapReduce and workflows, static scheduling strategies fall short for continuous data flows due to the variations in the input data rates and the need for sustained throughput. The elastic resource provisioning of cloud infrastructure is valuable to meet the changing resource needs of such continuous applications. However, multi-tenant cloud resources introduce yet another dimension of performance variability that impacts the applications throughput. In this paper we propose Plastic, an adaptive scheduling algorithm that balances resource cost and application throughput using a prediction-based look-ahead approach. It not only addresses variations in the input data rates but also the underlying cloud infrastructure. In addition, we also propose several simpler static scheduling heuristics that operate in the absence of accurate performance prediction model. These static and adaptive heuristics are evaluated through extensive simulations using performance traces obtained from Amazon AWS IaaS public cloud. Our results show an improvement of up to 20% in the overall profit as compared to the reactive adaptation algorithm.
Proceedings of the second international workshop on Data intensive computing in the clouds | 2011
Alok Gautam Kumbhare; Yogesh Simmhan; Viktor K. Prasanna
As Cloud platforms gain increasing traction among scientific and business communities for outsourcing storage, computing and content delivery, there is also growing concern about the associated loss of control over private data hosted in the Cloud. In this paper, we present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. Our repository model allows users to securely store and share their data in the Cloud without revealing the plain text to unauthorized users, the Cloud storage provider or the repository itself. The system masks file names, user permissions and access patterns while providing auditing capabilities with provable data updates.
international conference on distributed computing systems | 2015
Alok Gautam Kumbhare; Marc Frîncu; Yogesh Simmhan; Viktor K. Prasanna
The MapReduce programming model, due to its simplicity and scalability, has become an essential tool for processing large data volumes in distributed environments. Recent Stream Processing Systems (SPS) this model to provide low-latency analysis of high-velocity continuous data streams. However, integrating MapReduce with streaming poses challenges: first, the runtime variations in data characteristics such as data-rates and key-distribution cause resource overload, that in-turn leads to fluctuations in the Quality of the Service (QoS), and second, the stateful reducers, whose state depends on the complete tuple history, necessitates efficient fault-recovery mechanisms to maintain the desired QoS in the presence of resource failures. We propose an integrated streaming MapReduce architecture leveraging the concept of consistent hashing to support runtime elasticity along with locality-aware data and state replication to provide efficient load-balancing with low-overhead fault-tolerance and parallel fault-recovery from multiple simultaneous failures. Our evaluation on a private cloud shows up to 2.8× improvement in peak throughput compared to Apache Storm SPS, and a low recovery latency of 700 - 1500 ms from multiple failures.
international parallel and distributed processing symposium | 2015
Yogesh Simmhan; Neel Choudhury; Charith Wickramaarachchi; Alok Gautam Kumbhare; Marc Frîncu; Cauligi S. Raghavendra; Viktor K. Prasanna
Graphs are a key form of Big Data, and performing scalable analytics over them is invaluable to many domains. There is an emerging class of inter-connected data which accumulates or varies over time, and on which novel algorithms both over the network structure and across the time-variant attribute values is necessary. We formalize the notion of time-series graphs and propose a Temporally Iterative BSP programming abstraction to develop algorithms on such datasets using several design patterns. Our abstractions leverage a sub-graph centric programming model and extend it to the temporal dimension. We present three time-series graph algorithms based on these design patterns and abstractions, and analyze their performance using the Offish distributed platform on Amazon AWS Cloud. Our results demonstrate the efficacy of the abstractions to develop practical time-series graph algorithms, and scale them on commodity hardware.
ieee/acm international symposium cluster, cloud and grid computing | 2015
Charith Wickramaarachchi; Alok Gautam Kumbhare; Marc Frîncu; Charalampos Chelmis; Viktor K. Prasanna
Existing Big Data streams coming from social and other connected sensor networks exhibit intrinsic inter-dependency enabling unique challenges to scalable graph analytics. Data from these graphs is usually collected in different geographically located data servers making it suitable for distributed processing on clouds. While numerous solutions for large scale static graph analysis have been proposed, addressing in real-time the dynamics of social interactions requires novel approaches that leverage incremental stream processing and graph analytics on elastic clouds. We propose a scalable solution based on our stream processing engine, Floe, on top of which we perform real-time data processing and graph updates to enable low latency graph analytics on large evolving social networks. We demonstrate the platform on a large Twitter data set by performing several fast graph and non-graph analytics to extract in real-time the top k influential nodes, with different metrics, during key events such as the US NFL playoffs. This information allows advertisers to maximize their exposure to the public by always targeting the continuously changing set of most influential nodes. Its applicability spans multiple domains including surveillance, counter-terrorism, or disease spread monitoring. The evaluation will be performed on a combination our local cluster of 16 eight-core nodes running Eucalyptus fabric and 100s of virtual machines on the Amazon AWS public cloud. We will showcase the low latency in detecting changes in the graph under variable data streams, and also the efficiency of the platform to utilize resources and to elastically scale to meet demand.