Erik Vee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Erik Vee is active.

Explore More

Publication

Featured researches published by Erik Vee.

international conference on data engineering | 2008

Efficient Computation of Diverse Query Results

Erik Vee; Utkarsh Srivastava; Jayavel Shanmugasundaram; Prashant Bhat; Sihem Amer Yahia

We study the problem of efficiently computing diverse query results in online shopping applications, where users specify queries through a form interface that allows a mix of structured and content-based selection conditions. Intuitively, the goal of diverse query answering is to return a representative set of top-k answers from all the tuples that satisfy the user selection condition. For example, if a user is searching for Honda cars and we can only display five results, we wish to return cars from five different Honda models, as opposed to returning cars from only one or two Honda models. A key contribution of this paper is to formally define the notion of diversity, and to show that existing score based techniques commonly used in web applications are not sufficient to guarantee diversity. Another contribution of this paper is to develop novel and efficient query processing techniques that guarantee diversity. Our experimental results using Yahoo! Autos data show that our proposed techniques are scalable and efficient.

electronic commerce | 2010

Optimal online assignment with forecasts

Erik Vee; Sergei Vassilvitskii; Jayavel Shanmugasundaram

Motivated by the allocation problem facing publishers in display advertising we formulate the online assignment with forecast problem, a version of the online allocation problem where the algorithm has access to random samples from the future set of arriving vertices. We provide a solution that allows us to serve Internet users in an online manner that is provably nearly optimal. Our technique applies to the forecast version of a large class of online assignment problems, such as online bipartite matching, allocation, and budgeted bidders, in which we wish to minimize the value of some convex objective function subject to a set of linear supply and demand constraints. Our solution utilizes a particular subspace of the dual space, allowing us to describe the optimal primal solution implicitly in space proportional to the demand side of the input graph. More importantly, it allows us to prove that representing the primal solution using such a compact allocation plan yields a robust online algorithm which makes near-optimal online decisions. Furthermore, unlike the primal solution, we show that the compact allocation plan produced by considering only a sampled version of the original problem generalizes to produce a near optimal solution on the full problem instance.

international conference on management of data | 2008

Efficient bulk insertion into a distributed ordered table

Adam Silberstein; Brian F. Cooper; Utkarsh Srivastava; Erik Vee; Ramana Yerneni; Raghu Ramakrishnan

We study the problem of bulk-inserting records into tables in a system that horizontally range-partitions data over a large cluster of shared-nothing machines. Each table partition contains a contiguous portion of the tables key range, and must accept all records inserted into that range. Examples of such systems include BigTable[8] at Google, and PNUTS [15] at Yahoo! During bulk inserts into an existing table, if most of the inserted records end up going into a small number of data partitions, the obtained throughput may be very poor due to ineffective use of cluster parallelism. We propose a novel approach in which a planning phase is invoked before the actual insertions. By creating new partitions and intelligently distributing partitions across machines, the planning phase ensures that the insertion load will be well-balanced. Since there is a tradeoff between the cost of moving partitions and the resulting throughput gain, the planning phase must minimize the sum of partition movement time and insertion time. We show that this problem is a variation of NP-hard bin-packing, reduce it to a problem of packing vectors, and then give a solution with provable approximation guarantees. We evaluate our approach on a prototype system deployed on a cluster of 50 machines, and show that it yields significant improvements over more naïve techniques.

international conference on management of data | 2010

Efficiently evaluating complex boolean expressions

Marcus Fontoura; Suhas Sadanandan; Jayavel Shanmugasundaram; Sergei Vassilvitski; Erik Vee; Srihari Venkatesan; Jason Zien

The problem of efficiently evaluating a large collection of complex Boolean expressions - beyond simple conjunctions and Disjunctive/Conjunctive Normal Forms (DNF/CNF) - occurs in many emerging online advertising applications such as advertising exchanges and automatic targeting. The simple solution of normalizing complex Boolean expressions to DNF or CNF form, and then using existing methods for evaluating such expressions is not always effective because of the exponential blow-up in the size of expressions due to normalization. We thus propose a novel method for evaluating complex expressions, which leverages existing techniques for evaluating leaf-level conjunctions, and then uses a bottom-up evaluation technique to only process the relevant parts of the complex expressions that contain the matching conjunctions. We develop two such bottom-up evaluation techniques, one based on Dewey IDs and another based on mapping Boolean expressions to one-dimensional intervals. Our experimental evaluation based on data obtained from an online advertising exchange shows that the proposed techniques are efficient and scalable, both with respect to space usage as well as evaluation time.

workshop on internet and network economics | 2007

Cost of conciseness in sponsored search auctions

Zoë Abrams; Arpita Ghosh; Erik Vee

The generalized second price auction used in sponsored search has been analyzed for models where bidders value clicks on ads. However, advertisers do not derive value only from clicks, nor do they value clicks in all slots equally. There is a need to understand sponsored search auctions in a setting with more general bidder valuations, in order to encompass realistic advertising objectives such as branding and conversions. We investigate the practical scenario where bidders have a full spectrum of values for slots, which are not necessarily proportional to the expected number of clicks received, and report a single scalar bid to the generalized second price auction. We show that there always exists an equilibrium corresponding to the VCG outcome using these full vector values, under monotonicity conditions on the valuations of bidders and clickthrough rates. Further, we discuss the problem of bidding strategies leading to such efficient equilibria: contrary to the case when bidders have one-dimensional types, bidding strategies with reasonable restrictions on bid values do not exist.

ACM Transactions on Database Systems | 2008

Estimating statistical aggregates on probabilistic data streams

T. S. Jayram; Andrew McGregor; S. Muthukrishnan; Erik Vee

The probabilistic stream model was introduced by Jayram et al. [2007]. It is a generalization of the data stream model that is suited to handling probabilistic data, where each item of the stream represents a probability distribution over a set of possible events. Therefore, a probabilistic stream determines a distribution over a potentially exponential number of classical deterministic streams, where each item is deterministically one of the domain values. We present algorithms for computing commonly used aggregates on a probabilistic stream. We present the first one pass streaming algorithms for estimating the expected mean of a probabilistic stream. Next, we consider the problem of estimating frequency moments for probabilistic data. We propose a general approach to obtain unbiased estimators working over probabilistic data by utilizing unbiased estimators designed for standard streams. Applying this approach, we extend a classical data stream algorithm to obtain a one-pass algorithm for estimating F2, the second frequency moment. We present the first known streaming algorithms for estimating F0, the number of distinct items on probabilistic streams. Our work also gives an efficient one-pass algorithm for estimating the median, and a two-pass algorithm for estimating the range.

conference on information and knowledge management | 2010

Pricing guaranteed contracts in online display advertising

Vijay Bharadwaj; Wenjing Ma; Michael Schwarz; Jayavel Shanmugasundaram; Erik Vee; Jack Z. Xie; Jian Yang

We consider the problem of pricing guaranteed contracts in online display advertising. This problem has two key characteristics that when taken together distinguish it from related offline and online pricing problems: (1) the guaranteed contracts are sold months in advance, and at various points in time, and (2) the inventory that is sold to guaranteed contracts - user visits - is very high-dimensional, having hundreds of possible attributes, and advertisers can potentially buy any of the very large number (many trillions) of combinations of these attributes. Consequently, traditional pricing methods such as real-time or combinatorial auctions, or optimization-based pricing based on self- and cross-elasticities are not directly applicable to this problem. We hence propose a new pricing method, whereby the price of a guaranteed contract is computed based on the prices of the individual user visits that the contract is expected to get. The price of each individual user visit is in turn computed using historical sales prices that are negotiated between a sales person and an advertiser, and we propose two different variants in this context. Our evaluation using real guaranteed contracts shows that the proposed pricing method is accurate in the sense that it can effectively predict the prices of other (out-of-sample) historical contracts.

international conference on management of data | 2010

Forecasting high-dimensional data

Deepak Agarwal; Datong Chen; Long-Ji Lin; Jayavel Shanmugasundaram; Erik Vee

We propose a method for forecasting high-dimensional data (hundreds of attributes, trillions of attribute combinations) for a duration of several months. Our motivating application is guaranteed display advertising, a multi-billion dollar industry, whereby advertisers can buy targeted (high-dimensional) user visits from publishers many months or even years in advance. Forecasting high-dimensional data is challenging because of the many possible attribute combinations that need to be forecast. To address this issue, we propose a method whereby only a sub-set of attribute combinations are explicitly forecast and stored, while the other combinations are dynamically forecast on-the-fly using high-dimensional attribute correlation models. We evaluate various attribute correlation models, from simple models that assume the independence of attributes to more sophisticated sample-based models that fully capture the correlations in a high-dimensional space. Our evaluation using real-world display advertising data sets shows that fully capturing high-dimensional correlations leads to significant forecast accuracy gains. A variant of the proposed method has been implemented in the context of Yahoo!s guaranteed display advertising system.

web search and data mining | 2008

Connectivity structure of bipartite graphs via the KNC-plot

Ravi Kumar; Andrew Tomkins; Erik Vee

In this paper we introduce the k-neighbor connectivity plot, or KNC-plot, as a tool to study the macroscopic connectiv-ity structure of sparse bipartite graphs. Given a bipartite graph G = (U, V, E), we say that two nodes in U are k-neighbors if there exist at least k distinct length-two paths between them; this defines a k-neighborhood graph on U where the edges are given by the k-neighbor relation. For example, in a bipartite graph of users and interests, two users are k-neighbors if they have at least k common interests. The KNC-plot shows the degradation of connectivity of the graph as a function of k. We show that this tool provides an effective and interpretable high-level characterization of the connectivity of a bipartite graph However, naive algorithms to compute the KNC-plot are inefficient for k > 1. We give an efficient and practical algorithm that runs in sub-quadratic time O(|E|2-1/k) and is a non-trivial improvement over the obvious quadratic-time algorithms for this problem. We prove significant improvements in this runtime for graphs with power-law degree distributions, and give a different algorithm with near-linear runtime when V grows slowly as a function of the size of the graph We compute the KNC-plot of four large real-world bipartite graphs, and discuss the structural properties of these graphs that emerge. We conclude that the KNC-plot represents a useful and practical tool for macroscopic analysis of large bipartite graphs.

algorithmic game theory | 2011

The multiple attribution problem in pay-per-conversion advertising

Patrick R. Jordan; Mohammad Mahdian; Sergei Vassilvitskii; Erik Vee

In recent years the online advertising industry has witnessed a shift from the more traditional pay-per-impression model to the payper-click and more recently to the pay-per-conversion model. Such models require the ad allocation engine to translate the advertisers value per click/conversion to value per impression. This is often done through simple models that assume that each impression of the ad stochastically leads to a click/conversion independent of other impressions of the same ad, and therefore any click/conversion can be attributed to the last impression of the ad. However, this assumption is unrealistic, especially in the context of pay-per-conversion advertising, where it is well known in the marketing literature that the consumer often goes through a purchasing funnel before they make a purchase. Decisions to buy are rarely spontaneous, and therefore are not likely to be triggered by just the last ad impression. In this paper, we observe how the current method of attribution leads to inefficiency in the allocation mechanism. We develop a fairly general model to capture how a sequence of impressions can lead to a conversion, and solve the optimal ad allocation problem in this model. We will show that this allocation can be supplemented with a payment scheme to obtain a mechanism that is incentive compatible for the advertiser and fair for the publishers.

Explore More