Vaibhav Arora
University of California, Santa Barbara
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Vaibhav Arora.
international conference on management of data | 2015
Faisal Nawab; Vaibhav Arora; Divyakant Agrawal; Amr El Abbadi
Cross datacenter replication is increasingly being deployed to bring data closer to the user and to overcome datacenter outages. The extent of the influence of wide-area communication on serializable transactions is not yet clear. In this work, we derive a lower-bound on commit latency. The sum of the commit latency of any two datacenters is at least the Round-Trip Time (RTT) between them. We use the insights and lessons learned while deriving the lower-bound to develop a commit protocol, called Helios, that achieves low commit latencies. Helios actively exchanges transaction logs (history) between datacenters. The received logs are used to decide whether a transaction can commit or not. The earliest point in the received logs that is needed to commit a transaction is decided by Helios to ensure a low commit latency. As we show in the paper, Helios is theoretically able to achieve the lower-bound commit latency. Also, in a real-world deployment on five datacenters, Helios has a commit latency that is close to the optimal.
utility and cloud computing | 2012
Chris Bunch; Vaibhav Arora; Navraj Chohan; Chandra Krintz; Shashank Hegde; Ankit Srivastava
In this paper we present the design, implementation, and evaluation of a plug gable autoscaler within an open cloud platform-as-a-service (PaaS). We redefine high availability (HA) as the dynamic use of virtual machines to keep services available to users, making it a subset of elasticity (the dynamic use of virtual machines). This makes it possible to investigate autoscalers that simultaneously address HA and elasticity. We present and evaluate autoscalers within this plug gable system that are HA-aware and Quality-of-Service (QoS)-aware for web applications written in different programming languages. Hot spares can also be utilized to provide both HA and improve QoS to web users. Within the open source AppScale PaaS, hot spares can increase the amount of web traffic that the QoS-aware autoscaler serves to users by up to 32%. As this auto scaling system operates at the PaaS layer, it is able to control virtual machines and be cost-aware when addressing HA and QoS. This cost awareness uses Spot Instances within Amazon EC2 to reduce the cost of machines acquired by 91%, in exchange for increased startup time. This plug gable auto scaling system facilitates the investigation of new auto scaling algorithms by others that can take advantage of metrics provided by different levels of the cloud stack.
international conference on management of data | 2015
Aaron J. Elmore; Vaibhav Arora; Rebecca Taft; Andrew Pavlo; Divyakant Agrawal; Amr El Abbadi
For data-intensive applications with many concurrent users, modern distributed main memory database management systems (DBMS) provide the necessary scale-out support beyond what is possible with single-node systems. These DBMSs are optimized for the short-lived transactions that are common in on-line transaction processing (OLTP) workloads. One way that they achieve this is to partition the database into disjoint subsets and use a single-threaded transaction manager per partition that executes transactions one-at-a-time in serial order. This minimizes the overhead of concurrency control mechanisms, but requires careful partitioning to limit distributed transactions that span multiple partitions. Previous methods used off-line analysis to determine how to partition data, but the dynamic nature of these applications means that they are prone to hotspots. In these situations, the DBMS needs to reconfigure how data is partitioned in real-time to maintain performance objectives. Bringing the system off-line to reorganize the database is unacceptable for on-line applications. To overcome this problem, we introduce the Squall technique for supporting live reconfiguration in partitioned, main memory DBMSs. Squall supports fine-grained repartitioning of databases in the presence of distributed transactions, high throughput client workloads, and replicated data. An evaluation of our approach on a distributed DBMS shows that Squall can reconfigure a database with no downtime and minimal overhead on transaction latency.
very large data bases | 2014
Hatem A. Mahmoud; Vaibhav Arora; Faisal Nawab; Divyakant Agrawal; Amr El Abbadi
The past decade has witnessed an increasing adoption of cloud database technology, which provides better scalability, availability, and fault-tolerance via transparent partitioning and replication, and automatic load balancing and fail-over. However, only a small number of cloud databases provide strong consistency guarantees for distributed transactions, despite decades of research on distributed transaction processing, due to practical challenges that arise in the cloud setting, where failures are the norm, and human administration is minimal. For example, dealing with locks left by transactions initiated by failed machines, and determining a multi-programming level that avoids thrashing without under-utilizing available resources, are some of the challenges that arise when using lock-based transaction processing mechanisms in the cloud context. Even in the case of optimistic concurrency control, most proposals in the literature deal with distributed validation but still require the database to acquire locks during two-phase commit when installing updates of a single transaction on multiple machines. Very little theoretical work has been done to entirely eliminate the need for locking in distributed transactions, including locks acquired during two-phase commit. In this paper, we re-design optimistic concurrency control to eliminate any need for locking even for atomic commitment, while handling the practical issues in earlier theoretical work related to this problem. We conduct an extensive experimental study to evaluate our approach against lock-based methods under various setups and workloads, and demonstrate that our approach provides many practical advantages in the cloud context.
extending database technology | 2015
Faisal Nawab; Vaibhav Arora; Divyakant Agrawal; Amr El Abbadi
Web-based applications face unprecedented workloads demanding the processing of a large number of events reaching to the millions per second. That is why developers are increasingly relying on scalable cloud platforms to implement cloud applications. Chariots exposes a shared log to be used by cloud applications. The log is essential for many tasks like bookkeeping, recovery, and debugging. Logs offer linearizability and simple append and read operations of immutable records to facilitate building complex systems like stream processors and transaction managers. As a cloud platform, Chariots offers fault-tolerance, persistence, and high-availability, transparently. Current shared log infrastructures suffer from the bottleneck of serializing log records through a centralized server which limits the throughput to that of a single machine. We propose a novel distributed log store, called the Fractal Log Store (FLStore), that overcomes the bottleneck of a single-point of contention. FLStore maintains the log within the datacenter. We also propose Chariots, which provides multi-datacenter replication for shared logs. In it, FLStore is leveraged as the log store. Chariots maintains causal ordering of records in the log and has a scalable design that allows elastic expansion of resources.
international conference on distributed computing systems | 2017
Vaibhav Arora; Faisal Nawab; Divyakant Agrawal; Amr El Abbadi
Internet of Things (IoT) applications like smart cars, smart cities and wearables are becoming widespread and are the future of the Internet. One of the major challenges for IoT applications is efficiently processing, storing and analyzing the continuous stream of incoming data from a large number of connected sensors. We propose a multi-representation based data processing architecture for IoT applications. The data is stored in multiple representations, like rows, columns, graphs which provides support for diverse application demands. A unifying update mechanism based on deterministic scheduling is used to update the data representations, which completely removes the need for data transfer pipelines like ETL (Extract, Transform and Load). The combination of multiple representations, and the deterministic update mechanism, provides the ability to support real-time analytics and caters to IoT applications by minimizing the latency of operations like computing pre-defined aggregates.
IEEE Transactions on Knowledge and Data Engineering | 2018
Vaibhav Arora; Faisal Nawab; Divyakant Agrawal; Amr El Abbadi
Cloud-based data-intensive applications have to process high volumes of transactional and analytical requests on large-scale data. Businesses base their decisions on the results of analytical requests, creating a need for real-time analytical processing. We propose Janus, a hybrid scalable cloud datastore, which enables the efficient execution of diverse workloads by storing data in different representations. Janus manages big datasets in the context of datacenters, thus supporting scaling out by partitioning the data across multiple servers. This requires Janus to efficiently support distributed transactions. In order to support the different datacenter requirements, Janus also allows diverse partitioning strategies for the different representations. Janus proposes a novel data movement pipeline to continuously ensure up to date data between the different representations. Unlike existing multi-representation storage systems and Change Data Capture (CDC) pipelines, the data movement pipeline in Janus supports partitioning and handles both distributed transactions and diverse partitioning strategies. In this paper, we focus on supporting Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) workloads, and hence use row and column-oriented representations, which are the most efficient representations for these workloads. Our evaluations over Amazon AWS illustrate that Janus can provide real-time analytical results, in addition to processing high-throughput transactional workloads.
international conference on big data and smart computing | 2015
Divyakant Agrawal; Amr El Abbadi; Vaibhav Arora; Ceren Budak; Theodore Georgiou; Hatem A. Mahmoud; Faisal Nawab; Cetin Sahin; Shiyuan Wang
With large elastic and scalable infrastructures, the Cloud is the ideal storage repository for Big Data applications. Big Data is typically characterized by three Vs: Volume, Variety and Velocity. Supporting these properties raises significant challenges in a cloud setting, including partitioning for scale out; replication across data centers for fault-tolerance; significant latency overheads due to consistency requirements; efficient traversal needs due to high update and velocity; and continuous maintenance in the presence of large variety of data representations. Last but not least, the storage of private data necessitates the ability to efficiently execute queries in a privacy preserving manner (P), without revealing user access patterns. In this paper we highlight these challenges and illustrate sample state of the art solutions.
international conference on cloud computing | 2018
Vaibhav Arora; Ravi Kumar Suresh Babu; Sujaya Maiyya; Divyakant Agrawal; Amr El Abbadi; Xun Xue; Zhiyanan; Zhujianfeng
usenix conference on hot topics in cloud ccomputing | 2017
Vaibhav Arora; Tanuj Mittal; Divyakant Agrawal; Amr El Abbadi; Xun Xue; Yanan Zhi; Jianfeng Zhu