Shivakumar Venkataraman

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shivakumar Venkataraman is active.

Explore More

Publication

Featured researches published by Shivakumar Venkataraman.

international conference on management of data | 2013

Photon: fault-tolerant and scalable joining of continuous data streams

Rajagopal Ananthanarayanan; Venkatesh Basker; Sumit Das; Ashish Gupta; Haifeng Jiang; Tianhao Qiu; Alexey Reznichenko; Deomid Ryabkov; Manpreet Singh; Shivakumar Venkataraman

Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.

very large data bases | 2014

Mesa: geo-replicated, near real-time, scalable data warehousing

Ashish Gupta; Fan Yang; Jason Govig; Adam Kirsch; Kelvin K. Chan; Kevin Lai; Shuo Wu; Sandeep Govind Dhoot; Abhilash Rajesh Kumar; Ankur Agiwal; Sanjay Bhansali; Mingsheng Hong; Jamie Cameron; Masood Siddiqi; David Jones; Jeff Shute; Andrey Gubarev; Shivakumar Venkataraman; Divyakant Agrawal

Communications of The ACM | 2016

Mesa: a geo-replicated online data warehouse for Google's advertising system

Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Googles Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and retrieval, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Googles Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and retrieval, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.

very large data bases | 2014

Datacenters as computers: Google engineering & database research perspectives

Shivakumar Venkataraman; Divyakant Agrawal

In this collaborative keynote address, we will share Googles experience in building a scalable data infrastructure that leverages datacenters for managing Googles advertising data over the last decade. In order to support the massive online advertising platform at Google, the data infrastructure must simultaneously support both transactional and analytical workloads. The focus of this talk will be to highlight how the datacenter architecture and the cloud computing paradigm has enabled us to manage the exponential growth in data volumes and user queries, make our services highly available and fault tolerant to massive datacenter outages, and deliver results with very low latencies. We note that other Internet companies have also undergone similar growth in data volumes and user queries. In fact, this phenomenon has resulted in at least two new terms in the technology lexicon: big data and cloud computing. Cloud computing (and datacenters) have been largely responsible for scaling the data volumes from terabytes range just a few years ago to now reaching in the exabyte range over the next couple of years. Delivering solutions at this scale that are fault-tolerant, latency sensitive, and highly available requires a combination of research advances with engineering ingenuity at Google and elsewhere. Next, we will try to answer the following question: is a datacenter just another (very large) computer? Or, does it fundamentally change the design principles for data-centric applications and systems. We will conclude with some of the unique research challenges that need to be addressed in order to sustain continuous growth in data volumes while supporting high throughput and low latencies.

Archive | 2009

Query identification and association

Ramananthan V. Guha; Shivakumar Venkataraman; Vineet Gupta; Gokay Baris Gultekin; Pradnya Karbhari; Abhinav Jalan

Archive | 2011

Determining and displaying impression share to advertisers

Weipeng Yan; Shivakumar Venkataraman; Anshul Kothari

Archive | 2011

Tie breaking rules for content item matching

Roberto J. Bayardo; Uma Mahadevan; Giao Nguyen; Shivakumar Venkataraman; Adam Isaac Juda

Archive | 2017

Adjusting participation of content in a selection process

Shivakumar Venkataraman; Ramakrishnan Srikant; Anshul Kothari; Aranyak Mehta; Vivek Raghunathan; Nagbhushan Veerapaneni; Abhishek Bapna; Adam Isaac Juda

international conference on management of data | 2016

Shasta: Interactive Reporting At Scale

Gokul Nath Babu Manoharan; Stephan Ellner; Karl Schnaitter; Sridatta Chegu; Alejandro Estrella-Balderrama; Stephan Gudmundson; Apurv Gupta; Ben Handy; Bart Samwel; Chad Whipkey; Larysa Aharkava; Himani Apte; Nitin Gangahar; Jun Xu; Shivakumar Venkataraman; Divyakant Agrawal; Jeffrey D. Ullman

Archive | 2016