Shivakumar Venkataraman
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shivakumar Venkataraman.
international conference on management of data | 2013
Rajagopal Ananthanarayanan; Venkatesh Basker; Sumit Das; Ashish Gupta; Haifeng Jiang; Tianhao Qiu; Alexey Reznichenko; Deomid Ryabkov; Manpreet Singh; Shivakumar Venkataraman
Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.
very large data bases | 2014
Ashish Gupta; Fan Yang; Jason Govig; Adam Kirsch; Kelvin K. Chan; Kevin Lai; Shuo Wu; Sandeep Govind Dhoot; Abhilash Rajesh Kumar; Ankur Agiwal; Sanjay Bhansali; Mingsheng Hong; Jamie Cameron; Masood Siddiqi; David Jones; Jeff Shute; Andrey Gubarev; Shivakumar Venkataraman; Divyakant Agrawal
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Googles Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.
Communications of The ACM | 2016
Ashish Gupta; Fan Yang; Jason Govig; Adam Kirsch; Kelvin K. Chan; Kevin Lai; Shuo Wu; Sandeep Govind Dhoot; Abhilash Rajesh Kumar; Ankur Agiwal; Sanjay Bhansali; Mingsheng Hong; Jamie Cameron; Masood Siddiqi; David Jones; Jeff Shute; Andrey Gubarev; Shivakumar Venkataraman; Divyakant Agrawal
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Googles Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and retrieval, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Googles Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and retrieval, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.
very large data bases | 2014
Shivakumar Venkataraman; Divyakant Agrawal
In this collaborative keynote address, we will share Googles experience in building a scalable data infrastructure that leverages datacenters for managing Googles advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and analytical workloads. The focus of this talk will be to highlight how the datacenter architecture and the cloud computing paradigm has enabled us to manage the exponential growth in data volumes and user queries, make our services highly available and fault tolerant to massive datacenter outages, and deliver results with very low latencies. We note that other Internet companies have also undergone similar growth in data volumes and user queries. In fact, this phenomenon has resulted in at least two new terms in the technology lexicon: big data and cloud computing. Cloud computing (and datacenters) have been largely responsible for scaling the data volumes from terabytes range just a few years ago to now reaching in the exabyte range over the next couple of years. Delivering solutions at this scale that are fault-tolerant, latency sensitive, and highly available requires a combination of research advances with engineering ingenuity at Google and elsewhere. Next, we will try to answer the following question: is a datacenter just another (very large) computer? Or, does it fundamentally change the design principles for data-centric applications and systems. We will conclude with some of the unique research challenges that need to be addressed in order to sustain continuous growth in data volumes while supporting high throughput and low latencies.
Archive | 2009
Ramananthan V. Guha; Shivakumar Venkataraman; Vineet Gupta; Gokay Baris Gultekin; Pradnya Karbhari; Abhinav Jalan
Archive | 2011
Weipeng Yan; Shivakumar Venkataraman; Anshul Kothari
Archive | 2011
Roberto J. Bayardo; Uma Mahadevan; Giao Nguyen; Shivakumar Venkataraman; Adam Isaac Juda
Archive | 2017
Shivakumar Venkataraman; Ramakrishnan Srikant; Anshul Kothari; Aranyak Mehta; Vivek Raghunathan; Nagbhushan Veerapaneni; Abhishek Bapna; Adam Isaac Juda
international conference on management of data | 2016
Gokul Nath Babu Manoharan; Stephan Ellner; Karl Schnaitter; Sridatta Chegu; Alejandro Estrella-Balderrama; Stephan Gudmundson; Apurv Gupta; Ben Handy; Bart Samwel; Chad Whipkey; Larysa Aharkava; Himani Apte; Nitin Gangahar; Jun Xu; Shivakumar Venkataraman; Divyakant Agrawal; Jeffrey D. Ullman
Archive | 2016
Leora Ruth Wiseman; Shivakumar Venkataraman; Sridhar Ramaswamy