Pramod Bhatotia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pramod Bhatotia is active.

Explore More

Publication

Featured researches published by Pramod Bhatotia.

symposium on cloud computing | 2011

Incoop: MapReduce for incremental computations

Pramod Bhatotia; Alexander Wieder; Rodrigo Rodrigues; Umut A. Acar; Rafael Pasquin

Many online data sets evolve over time as new entries are slowly added and existing entries are deleted or modified. Taking advantage of this, systems for incremental bulk data processing, such as Googles Percolator, can achieve efficient updates. To achieve this efficiency, however, these systems lose compatibility with the simple programming models offered by non-incremental systems, e.g., MapReduce, and more importantly, requires the programmer to implement application-specific dynamic algorithms, ultimately increasing algorithm and code complexity. In this paper, we describe the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations. Incoop detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism. To achieve efficiency without sacrificing transparency, we adopt recent advances in the area of programming languages to identify the shortcomings of task-level memoization approaches, and to address these shortcomings by using several novel techniques: a storage system, a contraction phase for Reduce tasks, and an affinity-based scheduling algorithm. We have implemented Incoop by extending the Hadoop framework, and evaluated it by considering several applications and case studies. Our results show significant performance improvements without changing a single line of application code.

principles of distributed computing | 2010

Brief announcement: modelling MapReduce for optimal execution in the cloud

Alexander Wieder; Pramod Bhatotia; Ansley Post; Rodrigo Rodrigues

We describe a model for MapReduce computations that can be used to optimize the increasingly complex choice of resources that cloud customers purchase.

Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware | 2010

Conductor: orchestrating the clouds

Alexander Wieder; Pramod Bhatotia; Ansley Post; Rodrigo Rodrigues

Cloud computing enables customers to access virtually unlimited resources on demand and without any fixed upfront cost. However, the commoditization of computing resources imposes new challenges in how to manage them: customers of cloud services are no longer restricted to the resources they own, but instead choose from a variety of different services offered by different providers, and the impact of these choices on price and overall performance is not always clear. Furthermore, having to take into account new cloud products and services, the cost of recovering from faults, or price fluctuations due to spot markets makes the picture even more unclear. This position paper highlights a series of challenges that must be overcome in order to allow customers to better lever-age cloud resources. We also make the case for a system called Conductor that automatically manages resources in cloud computing to meet user-specifiable optimization goals, such as minimizing monetary cost or completion time. Finally, we discuss some of the challenges we will face in building such a system.

european conference on computer systems | 2017

SGXBOUNDS: Memory Safety for Shielded Execution

Dmitrii Kuvaiskii; Oleksii Oleksenko; Sergei Arnautov; Bohdan Trach; Pramod Bhatotia; Pascal Felber; Christof Fetzer

Shielded execution based on Intel SGX provides strong security guarantees for legacy applications running on untrusted platforms. However, memory safety attacks such as Heartbleed can render the confidentiality and integrity properties of shielded execution completely ineffective. To prevent these attacks, the state-of-the-art memory-safety approaches can be used in the context of shielded execution. In this work, we first showcase that two prominent software- and hardware-based defenses, AddressSanitizer and Intel MPX respectively, are impractical for shielded execution due to high performance and memory overheads. This motivated our design of SGXBounds---an efficient memory-safety approach for shielded execution exploiting the architectural features of Intel SGX. Our design is based on a simple combination of tagged pointers and compact memory layout. We implemented SGXBounds based on the LLVM compiler framework targeting unmodified multithreaded applications. Our evaluation using Phoenix, PARSEC, and RIPE benchmark suites shows that SGXBounds has performance and memory overheads of 17% and 0.1% respectively, while providing security guarantees similar to AddressSanitizer and Intel MPX. We have obtained similar results with SPEC CPU2006 and four real-world case studies: SQLite, Memcached, Apache, and Nginx.

international world wide web conferences | 2016

IncApprox: A Data Analytics System for Incremental Approximate Computing

Dhanya R. Krishnan; Do Le Quoc; Pramod Bhatotia; Christof Fetzer; Rodrigo Rodrigues

Incremental and approximate computations are increasingly being adopted for data analytics to achieve low-latency execution and efficient utilization of computing resources. Incremental computation updates the output incrementally instead of re-computing everything from scratch for successive runs of a job with input changes. Approximate computation returns an approximate output for a job instead of the exact output. Both paradigms rely on computing over a subset of data items instead of computing over the entire dataset, but they differ in their means for skipping parts of the computation. Incremental computing relies on the memoization of intermediate results of sub-computations, and reusing these memoized results across jobs. Approximate computing relies on representative sampling of the entire dataset to compute over a subset of data items. In this paper, we observe that these two paradigms are complementary, and can be married together! Our idea is quite simple: design a sampling algorithm that biases the sample selection to the memoized data items from previous runs. To realize this idea, we designed an online stratified sampling algorithm that uses self-adjusting computation to produce an incrementally updated approximate output with bounded error. We implemented our algorithm in a data analytics system called IncApprox based on Apache Spark Streaming. Our evaluation using micro-benchmarks and real-world case-studies shows that IncApprox achieves the benefits of both incremental and approximate computing.

architectural support for programming languages and operating systems | 2015

iThreads: A Threading Library for Parallel Incremental Computation

Pramod Bhatotia; Pedro Fonseca; Umut A. Acar; Björn B. Brandenburg; Rodrigo Rodrigues

Incremental computation strives for efficient successive runs of applications by re-executing only those parts of the computation that are affected by a given input change instead of recomputing everything from scratch. To realize these benefits automatically, we describe iThreads, a threading library for parallel incremental computation. iThreads supports unmodified shared-memory multithreaded programs: it can be used as a replacement for pthreads by a simple exchange of dynamically linked libraries, without even recompiling the application code. To enable such an interface, we designed algorithms and an implementation to operate at the compiled binary code level by leveraging MMU-assisted memory access tracking and process-based thread isolation. Our evaluation on a multicore platform using applications from the PARSEC and Phoenix benchmarks and two case-studies shows significant performance gains.

european conference on computer systems | 2016

HAFT: hardware-assisted fault tolerance

Dmitrii Kuvaiskii; Rasha Faqeh; Pramod Bhatotia; Pascal Felber; Christof Fetzer

Transient hardware faults during the execution of a program can cause data corruptions. We present HAFT, a fault tolerance technique using hardware extensions of commodity CPUs to protect unmodified multithreaded applications against such corruptions. HAFT utilizes instruction-level redundancy for fault detection and hardware transactional memory for fault recovery. We evaluated HAFT with Phoenix and PARSEC benchmarks. The observed normalized runtime is 2x, with 98.9% of the injected data corruptions being detected and 91.2% being corrected. To demonstrate the effectiveness of HAFT, we applied it to real-world case studies including Memcached, Apache, and SQLite.

international middleware conference | 2014

Slider: incremental sliding window analytics

Pramod Bhatotia; Umut A. Acar; Flavio Junqueira; Rodrigo Rodrigues

Sliding window analytics is often used in distributed data-parallel computing for analyzing large streams of continuously arriving data. When pairs of consecutive windows overlap, there is a potential to update the output incrementally, more efficiently than recomputing from scratch. However, in most systems, realizing this potential requires programmers to explicitly manage the intermediate state for overlapping windows, and devise an application-specific algorithm to incrementally update the output. In this paper, we present self-adjusting contraction trees, a set of data structures and algorithms for transparently updating the output of a sliding window computation as the window moves, while reusing, to the extent possible, results from prior computations. Self-adjusting contraction trees structure sub-computations of a data-parallel computation in the form of a shallow (logarithmic depth) balanced data dependence graph, through which input changes are efficiently propagated in asymptotically sub-linear time. We implemented self-adjusting contraction trees in a system called Slider. The design of Slider incorporates several novel techniques, most notably: (i) a set of self balancing trees tuned for different variants of sliding window computation (append-only, fixed-width, or variable-width slides); (ii) a split processing mode, where a background pre-processing stage leverages the predictability of input changes to pave the way for a more efficient foreground processing when the window slides; and (iii) an extension of the data structures to handle multiple-job workflows such as data-flow query processing. We evaluated Slider using a variety of applications and real-world case studies. Our results show significant performance gains without requiring any changes to the existing application code used for non-incremental data processing.

international conference on distributed computing systems | 2016

INSPECTOR: Data Provenance Using Intel Processor Trace (PT)

Jörg Thalheim; Pramod Bhatotia; Christof Fetzer

Data provenance strives for explaining how the computation was performed by recording a trace of the execution. The provenance trace is useful across a wide-range of workflows to improve the dependability, security, and efficiency of software systems. In this paper, we present Inspector, a POSIX-compliant data provenance library for shared-memory multithreaded programs. The Inspector library is completely transparent and easy to use: it can be used as a replacement for the pthreads library by a simple exchange of libraries linked, without even recompiling the application code. To achieve this result, we present a parallel provenance algorithm that records control, data, and schedule dependencies using a Concurrent Provenance Graph (CPG). We implemented our algorithm to operate at the compiled binary code level by leveraging a combination of OS-specific mechanisms, and recently released Intel PT ISA extensions as part of the Broadwell micro-architecture. Our evaluation on a multicore platform using applications from multithreaded benchmarks suites (PARSEC and Phoenix) shows reasonable provenance overheads for a majority of applications. Lastly, we briefly describe three case-studies where the generic interface exported by Inspector is being used to improve the dependability, security, and efficiency of systems. The Inspector library is publicly available for further use in a wide range of other provenance workflows.

Archive | 2015

Incremental parallel and distributed systems

Pramod Bhatotia; Rodrigo Rodrigues; Peter Druschel

Incremental computation strives for efficient successive runs of applications by reexecuting only those parts of the computation that are affected by a given input change instead of recomputing everything from scratch. To realize the benefits of incremental computation, researchers and practitioners are developing new systems where the application programmer can provide an efficient update mechanism for changing application data. Unfortunately, most of the existing solutions are limiting because they not only depart from existing programming models, but also require programmers to devise an incremental update mechanism (or a dynamic algorithm) on a per-application basis. In this thesis, we present incremental parallel and distributed systems that enable existing real-world applications to automatically benefit from efficient incremental updates. Our approach neither requires departure from current models of programming, nor the design and implementation of dynamic algorithms. To achieve these goals, we have designed and built the following incremental systems: (i) Incoop — a system for incremental MapReduce computation; (ii) Shredder — a GPU-accelerated system for incremental storage; (iii) Slider — a stream processing platform for incremental slidingwindow analytics; and (iv) iThreads — a threading library for parallel incremental computation. Our experience with these systems shows that significant performance can be achieved for existing applications without requiring any additional effort from programmers.

Explore More