Adwords in a Panorama
AAdWords in a Panorama
Zhiyi Huang ∗ Qiankun Zhang † Yuhao Zhang ‡ August 2020
Abstract
Three decades ago, Karp, Vazirani, and Vazirani (STOC 1990) defined the online matchingproblem and gave an optimal 1 − e ≈ . AdWords driven by online advertising and obtained the optimal 1 − e competitive ratio in the special caseof small bids . It has been open ever since whether there is an algorithm for general bids betterthan the 0 . . ∗ The University of Hong Kong. Email: [email protected]. † The University of Hong Kong. Email: [email protected]. ‡ The University of Hong Kong. Email: [email protected]. a r X i v : . [ c s . D S ] N ov Introduction
Consider an ad platform in online advertising, e.g., a search engine in the case of sponsored search.Each advertiser on the platform provides its bids for a set of keywords, for which it likes its ad tobe shown. It further has a budget which upper bounds its payment in a day. When a user submitsa request, often referred to as an impression , the platform sees the bids of the advertisers for it.The platform then selects an advertiser, who pays either its bid or its remaining budget, whicheveris smaller. The goal of the platform is to allocate impressions to advertisers to maximize the totalpayment of the advertisers. The revenues of online advertising in the US have surpassed those oftelevision advertising in 2016 [5], and have totaled $ − e ≈ .
632 competitive ratio. It can be viewed as the special case with unitbids and unit budgets. Fifteen years later, Mehta et al. [36] formally formulated it as the
AdWords problem. They introduced an optimal 1 − e -competitive algorithm under the small-bid assumption :an advertiser’s bid for any impression is much smaller than its budget.Subsequently, AdWords has been studied under stochastic assumptions. Goel and Mehta [17]showed that assuming a random arrival order of the impressions and small bids, a 1 − e compet-itive ratio can be achieved using the greedy algorithm: allocate each impression to the advertiserwho would make the largest payment. Later, the algorithm proposed by Devanur and Hayes [7]achieved the near-optimal competitive ratio of 1 − (cid:15) under the same random-arrival and small-bidassumptions. Mirrokni et al. [38] analyzed the algorithm of Mehta et al. [36] in the more restricted unknown iid model , and obtained an improved ratio of 0 .
76 for small bids. Finally, Devanur et al.[9] proved that the greedy algorithm is 1 − e -competitive for general bids in the unknown iid model.For small bids, they proposed a (1 − (cid:15) )-competitive algorithm. We refer readers to the survey byMehta [34] for further references.Little is known, however, about the most general case of AdWords, i.e., with general bids andwithout stochastic assumptions. On the positive end, we only have the greedy algorithm and thetrivial 0 . − e competitive ratio of online matching cannot be achieved in AdWords. It has been open sinceMehta et al. [36] whether there is an online algorithm that achieves a competitive ratio strictlybetter than 0 . The main result of the paper is the first online algorithm for AdWords that breaks the 0 . Theorem 1.
There is a . -competitive algorithm for AdWords. We develop the algorithm under the online primal dual framework. In a nutshell, by consideringan appropriate linear program (LP) of the problem, the online primal dual framework designs theonline algorithm according to the optimality conditions of LPs, and uses the objective of the dualLP as the benchmark in the analysis. Buchbinder et al. [4] applied it to AdWords with smallbids, using the standard matching LP, to obtain an alternative analysis of the 1 − e competitivealgorithm by Mehta et al. [36]. Later, Devanur and Jain [8] and Devanur et al. [10] found furtherapplications of the framework in other online matching problems. Recently, Huang and Zhang [21]demonstrated an advantage of using the configuration LP instead of the standard matching LP1n online matching with stochastic rewards. The current paper also builds on the strength of theconfiguration LP, echoing the message of Huang and Zhang [21]. See Section 2 for details.Our second ingredient is a novel formulation of AdWords which we call the panorama view .Recall that an advertiser’s payment in the original formulation is either the sum of its bids for theassigned impressions or its budget, whichever is smaller. The panorama view further associateseach advertiser with an interval whose length equals the budget, and requires the algorithms toassign each impression to not only an advertiser, but further a subset of its interval with size equalto the bid. For example, consider an impression i and an advertiser a whose budget is 2 and whosebid for i is 1. The panorama view associates advertiser a with an interval [0 , i to a , it must further assign i to a subset of size at most 1, e.g., [0 . , . i to an advertiser a affectsthe marginal gains of the other impressions assigned to a . Concretely, suppose we shortlist twoadvertisers for each impression, and then assign it to one of them with a fresh random bit. In theoriginal formulation, having advertiser a in impression i ’s shortlist decreases the marginal gain of all other impressions that shortlist a in a complicated manner . In the panorama view, however, itdecreases the marginal gain only for those whose assigned subsets intersect with i ’s ; more precisely,it decreases the contribution of the intersection by half . See Section 3 for a formal definition of thepanorama view and some examples.Finally, instead of using a fresh random bit to select a shortlisted advertiser for each impres-sion, our algorithm selects one with negative correlation. If a previous impression which shortlistsadvertiser a with an overlapping subset does not select a , the current one will be more likely toselect a . Given the same shortlists, negatively correlated selections get larger expected gains inthe panorama view than independent selections. An algorithmic ingredient called online correlatedselection (OCS) by Huang and Tao [19, 20] provides a quantitative control of such negative correla-tion in the special case when bids equal budgets. The final piece of our algorithm is a generalizationof OCS which applies to the general case of AdWords in the panorama view. We refer to it as thepanoramic OCS (PanOCS). Section 2 includes a formal definition of OCS, Section 4.1 defines thePanOCS and sketches the main ideas behind it, and Section 5 provides the details.Building on these ingredients, we get a 0 . . . AdWords is closely related to the literature of online matching started by Karp et al. [30]. Aggarwalet al. [1] studied the vertex-weighted problem and obtained the optimal 1 − e competitive ratiowith a generalization of the algorithm by Karp et al. [30]. Feldman et al. [14] investigated edge-weighted online matching in the free-disposal model, where the algorithm may dispose a previousmatched edge for free to make room for a new one. They called it the display ads problem, andachieved the optimal 1 − e competitive ratio assuming large capacities, i.e., each offline vertex canbe matched to a large number of online vertices. The analysis was simplified by Devanur et al. [11]under the online primal dual framework. Further, Fehrbach et al. [12, 13, 19, 20] obtained a better2han 0 . − e -competitive in this model.Huang et al. [23] gave a better than 1 − e -competitive algorithm for the vertex-weighted problem.Kesselheim et al. [31] showed that the greedy algorithm is e -competitive for the edge-weightedproblem even without free-disposal. Under the stronger assumption that online vertices are drawniid from an unknown distribution, Kapralov et al. [28] proved that greedy is 1 − e -competitive fora more general problem called online submodular welfare maximization which captures both theedge-weighted problem with free-disposal and AdWords as special cases. Further assuming thatthe distribution is known leads to better competitive ratios [15, 18, 27, 33]. We leave for futureresearch if the algorithm in this paper is better than 1 − e -competitive under random arrivals.Finally, Mehta and Panigrahi [35] proposed online matching with stochastic rewards, where anedge chosen by the algorithm is successfully matched only with some probability. They focused onthe special case of equal success probabilities and gave algorithms that are 0 . . . .
576 and 0 .
572 for vanishing equal and unequalsuccess probabilities respectively. In doing so, they showed an advantage of the configuration LPover the standard matching LP under online primal dual. This paper echoes the above message.
Consider a bipartite graph G = ( A, I, E ), where A and I are sets of vertices corresponding to theadvertisers and impressions in AdWords respectively, and E ⊆ A × I is the set of edges betweenthem. Further, each edge ( a, i ) is associated with a non-negative real number b ai which representsadvertiser a ’s bid for impression i . By allowing zero bids, we may assume without loss of generality(wlog) that G is a complete bipartite graph, i.e., E = A × I . Finally, each advertiser a is associatedwith a positive budget B a which upper bounds the payment of the advertiser. Concretely, assigninga subset of impressions S ⊆ I to an advertiser a leads to a budget-additive payment: b a ( S ) def = min (cid:26) (cid:88) i ∈ S b ai , B a (cid:27) . By this definition, we may assume wlog that b ai ≤ B a for any advertiser a and any impression i .The advertisers are given upfront, while the impressions arrive one at a time. We write i < i (cid:48) ifan impression i arrives before another impression i (cid:48) . On the arrival of an impression, the algorithmmust immediately and irrevocably assign it to an advertiser. The objective is to maximize the sumof the above payments from all advertisers. Following the standard competitive analysis of onlinealgorithms, an algorithm is Γ -competitive for some competitive ratio ≤ Γ ≤ Configuration Linear Program.
The algorithms in this paper and their analyses rely on theLP relaxations of the problem. Instead of the standard matching LP, this paper considers the more AdWords as an online algorithm problem does not consider the strategic behaviors of the advertisers. We merelyinherit the term bid from the original paper of Mehta et al. [36]. (cid:88) a ∈ A (cid:88) S ⊆ I b a ( S ) x aS s.t. (cid:88) S ⊆ I x aS ≤ ∀ a ∈ A (cid:88) a ∈ A (cid:88) S (cid:51) i x aS ≤ ∀ i ∈ Ix aS ≥ ∀ a ∈ A, ∀ S ⊆ I min (cid:88) a ∈ A α a + (cid:88) i ∈ I β i s.t. α a + (cid:88) i ∈ S β i ≥ b a ( S ) ∀ a ∈ A, ∀ S ⊆ Iα a ≥ ∀ a ∈ Aβ i ≥ ∀ i ∈ I Let P and D denote the objectives of the primal and dual LPs respectively. Throughout thepaper we will always let x aS be the probability that S is the subset of impressions assigned toadvertiser a . Then, the primal objective P equals the objective of the algorithm. Online Primal Dual Framework.
We build on the online primal dual framework which usesthe dual objective as an upper bound of the offline optimal in the competitive analyses of onlinealgorithms. In particular, this paper applies it to the configuration LP of AdWords.
Lemma 2.
Suppose an online algorithm is coupled with a dual algorithm which maintains a dualassignment such that for some ≤ Γ ≤ :1. Approximate dual feasibility: α a + (cid:80) i ∈ S β i ≥ Γ · b a ( S ) for any a ∈ A and any S ⊆ I .2. Reverse weak duality: P ≥ D ;Then, it is Γ -competitive.Proof. By the first condition, scaling the dual assignment by a factor of Γ − makes it feasible whilechanging the dual objective by the same factor. Therefore, by weak duality of LPs, the offlineoptimal is at most Γ − D . Putting together with the second condition proves the lemma. Online Correlated Selection.
The algorithms in this paper further utilize a recent algorithmicingredient called online correlated selection (OCS) by Huang and Tao [19, 20]. Consider a set ofground elements, and further a sequence of pairs of these elements arriving one at a time. Supposewe randomly select one element from each pair with a fresh random bit. Then, an element willbe selected at least once with probability 1 − − k after appearing in k pairs. The OCS correlatesthe randomness to achieve better efficiency. We state below a simplified definition, removing someaspects irrelevant to AdWords. Definition 1.
For any 0 ≤ γ ≤
1, a γ -OCS is an online algorithm ensuring that for any elementwhich appears in k pairs, it is selected at least once with probability at least:1 − − k (1 − γ ) max { k − , } . The algorithms in this paper are based on a novel viewpoint of the AdWords problem which wecall the panorama view . Recall that the payment of an advertiser a is budget-additive in AdWords:4ssigning a subset of impressions S to an advertiser a gives b a ( S ) = min (cid:8) (cid:80) i ∈ S b ai , B a (cid:9) . Let µ ( · )denote the Lebesgue measure. In the panorama view, we further associate each advertiser a with aninterval [0 , B a ); each impression i assigned to a is further assigned to a subset Y ai ⊆ [0 , B a ) whoseLebesgue measure µ ( Y ai ) is at most b ai . In fact, we will always choose Y ai to be a finite union ofdisjoint left-closed, right-open intervals, for which the Lebesgue measure is simply the sum of theirlengths. Further define the payment of an advertiser a in the panorama view as: µ (cid:0) ∪ i ∈ S Y ai (cid:1) . Correspondingly, the objective in the panorama view is the sum of the above payment from alladvertisers. Importantly, it lower bounds the original objective of AdWords.
Lemma 3.
For any advertiser a , any subset of impressions S assigned to a , and any subsets Y ai ⊆ [0 , B a ) with Lebesgue measure at most b ai for impressions i ∈ S , we have: µ (cid:0) ∪ i ∈ S Y ai (cid:1) ≤ min (cid:26) (cid:88) i ∈ S b ai , B a (cid:27) . Proof.
On the one hand, by subadditivity of the Lebesgue measure function µ , and by the measureupper bounds of the subsets Y ai ’s, we have µ (cid:0) ∪ i ∈ S Y ai (cid:1) ≤ (cid:80) i ∈ S µ (cid:0) Y ai (cid:1) ≤ (cid:80) i ∈ S b ai . On the otherhand, because Y ai ’s are subsets of [0 , B a ), we have µ (cid:0) ∪ i ∈ S Y ai (cid:1) ≤ µ (cid:0) [0 , B a ) (cid:1) = B a . Example 1 (Deterministic Algorithms).
Consider an arbitrary deterministic algorithm. Then,whenever it assigns an impression i to an advertiser a , we may wlog further assign it to the leftmostunassigned interval. For instance, suppose impressions 1 , ,
3, and so on are assigned to advertiser a in this order; we may further assign 1 to [0 , b a ), 2 to [ b a , b a + b a ), 3 to [ b a + b a , b a + b a + b a ),and so forth. In doing so, the objectives in the panorama view is identical to the original one. Oblivious Semi-randomized Algorithms.
The panorama view of AdWords separates itselffrom the original one when it comes to a special family of randomized algorithms which we call the oblivious semi-randomized algorithms . They are semi-randomized in that for every impressions i ,they either assign it deterministically to an advertiser-subset combination, or choose two advertiser-subset combinations and assign it to one of them with equal marginal probability. We shall referto the former as a deterministic round and the latter as a randomized round . If an impression i corresponds to a randomized round, we say that it is semi-assigned to the advertiser-subset com-binations. For the time being, readers may think of using a fresh random bit in every randomizedround for a concrete understanding of the panorama view, although our algorithms will correlatethe decisions in different rounds negatively.Further, these algorithms are oblivious: neither the decisions of deterministic versus randomizedrounds, nor the choices of advertiser-subset combinations depend on the realization of random bitsin previous rounds. Hence, the semi-assignments to the same advertiser may have overlappingsubsets and thus, the objective in the panorama view no longer equals the original one in general. Example 2 (Oblivious Semi-randomized Algorithms).
Let there be two advertisers whosebudgets equal 2, and three impressions for which both advertisers bid 1. Further suppose that weselect with a fresh random bit for each impression. In the original budget-additive payments, withprobability all impressions are assigned to the same advertiser and thus the objective equals 2;otherwise, the objective equals 3. Hence, the expected objective equals · · . In thepanorama view, however, the algorithm must further assign each impression to a subset of [0 ,
2) for5oth advertisers. It is wlog to assign the first impression to [0 , ,
2) so that it is disjoint withthe first one and contributes 1 to the objective. However, the third impression only contributes to the objective regardless of the choices of subsets because the entire interval [0 ,
2) has beensemi-assigned once. Therefore, the expected objective is only 1 + 1 + = .It may seem odd to restrict ourselves to oblivious algorithms. Would it not be better if we firstcheck the realized assignments in earlier rounds and then pick an advertiser-subset combinationdisjoint with the previous ones? In a nutshell, we focus on oblivious algorithms to separate thealgorithmic component for choosing assignments and semi-assignments, and that for correlating thedecisions in different randomized rounds. Importantly, we can achieve negative correlation: a semi-assignment is more likely to get selected if an earlier overlapping one is not. By contrast, screeningthe options based on the realization of earlier random bits could lead to positive correlations (e.g.,weighted sampling without replacements [2]). That said, there may be AdWords algorithms withcontrolled positive correlations which are better than 0 . Bookkeeping at the Point-level.
It is more convenient to account for the primal objective atthe point-level as follows. We say that a point y ∈ [0 , B a ) of an advertiser a is assigned if there isan impression i assigned to a and a subset Y ai containing y , either due to a deterministic round, ordue to a semi-assignment in a randomized round which selected a . For randomized algorithms, let0 ≤ x a ( y ) ≤ y is assigned. Then, the primal objective equals: P = (cid:88) a ∈ A (cid:90) B a x a ( y ) dy . (1)Similarly, we say that y is semi-assigned whenever an impression is semi-assigned to a and asubset containing y . Let k a ( y ) denote the number of times that y is semi-assigned. Further define k a ( y ) = ∞ if y has been assigned in a deterministic round, driven by the fact that semi-assignmentson their own take finitely many rounds to make a point y assigned with certainty.We further introduce point-level dual variables α a ( y ), a ∈ A , y ∈ [0 , B a ), and let: α a = (cid:90) B a α a ( y ) dy . (2)Then, approximate dual feasibility becomes: (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ Γ · b a ( S ) (3) Panoramic Interval-level Assignments.
We first introduce some notations which are usefulthroughout the paper. For any point y ∈ [0 , B a ), any subset Y ⊆ [0 , B a ), and any 0 ≤ b ≤ B a ,let y ⊕ Y b denote the point in [0 , B a ) such that the interval [ y, y ⊕ Y b ) excluding Y has Lebesguemeasure b . Here, we abuse notation and allow y ⊕ Y b to be smaller than y , in which case [ y, y ⊕ Y b )denotes the union of [ y, B a ) and [0 , y ⊕ Y b ). In the boundary case when the subset [0 , B a ) \ Y hasa measure strictly less than b , i.e., B a − µ ( Y ) < b , define y ⊕ Y b = y . Further define the reverseoperation y (cid:9) Y b such that [ y (cid:9) Y b, y ) \ Y has measure b .For any advertiser a and the set of impressions assigned and semi-assigned to it, the algorithmwill further select subsets of [0 , B a ) greedily as follows. Maintain a point y ∗ initially at 0 which6 !" 𝑘 !" + 1∞0 𝐵 ! (a) Interval-level assignments 𝑘 !" 𝑘 !" + 1∞0 𝐵 ! (b) Update in a deterministic round 𝑘 !" 𝑘 !" + 1∞0 𝐵 ! 𝑘 !" + 2 (c) Update in a randomized round Figure 1: Illustrative example of interval-level assignment represented by k a ( y )represents the start of the next subset. For each impression i , further assign or semi-assign it to[ y ∗ , y ∗ ⊕ Y D b ai ) \ Y D , where Y D is the subset of [0 , B a ) that has already been assigned deterministically.Further update y ∗ = y ∗ ⊕ Y D b ai . To this end, think of [0 , B a ) as a circle by gluing its endpoints; thealgorithm scans along the circle to find a subset with measure b ai that has not been deterministicallyassigned. It is similar to taking a panorama and hence the name of the alternative view of AdWords.The panoramic interval-level assignments equalize the numbers of times the points y ∈ [0 , B a )are semi-assigned, among those that have not been deterministically assigned. We omit the proofsince it follows by the definition of the algorithm. See Figure 1 for an illustrative example. Lemma 4.
For any a ∈ A and any y ∈ [0 , B a ) , k a ( y ) equals either (1) k min = min z ∈ [0 ,B a ) k a ( z ) , or(2) k min + 1 , or (3) ∞ . Further, the first kind satisfies y ≥ y ∗ , and the second kind satisfies y < y ∗ . This section presents an oblivious semi-randomized algorithm that is better than 0 . panoramiconline correlated selection (PanOCS), which correlates the randomized decisions in different roundsnegatively. Section 4.2 then demonstrates an online algorithm powered by PanOCS, and Section 4.3analyzes it under the online primal dual framework. Finally, Section 4.4 optimizes the parametersof the algorithm to achieve a 0 . B a ≤ b ai ≤ B a , and a smaller 0 . . . Recall the oblivious semi-randomized algorithms. In each randomized round, such an algorithmchooses a pair of advertiser-subset combinations, oblivious to the random bits in previous rounds.Then, the combinations are passed on to an algorithmic component which selects one of them withequal marginal probability, and correlates across different randomized rounds negatively . We callit the PanOCS since it is a generalization of the OCS [19, 20] in the panorama view of AdWords.For a formal definition, recall that x a ( y ) is the probability a point y ∈ [0 , B a ) of an advertiser a has been assigned, and k a ( y ) is the number of randomized rounds in which y is semi-assigned.7 efinition 2. A PanOCS is an online algorithm which takes a sequence of pairs of advertiser-subset combinations as input, and for each pair selects one combination. It is a γ -PanOCS forsome 0 ≤ γ ≤ a , and any point y ∈ [0 , B a ), we have: x a ( y ) ≥ − − k a ( y ) (1 − γ ) max { k a ( y ) − , } . (4)Observe that using an independent random bit in every randomized round is a 0-PanOCS,since the probability of being assigned after k semi-assignments with independent random bits isprecisely 1 − − k . The parameter γ quantifies the advantage over independent random bits.The intuition behind the inequality is best explained with a thought experiment. Suppose thatwhenever y is semi-assigned other than the first time, there is a γ chance to be perfectly negativelycorrelated with the last semi-assignment of y : a is chosen this time if it is not chosen last time,and vice versa. Further suppose that the above events are negatively dependent for the k a ( y ) − y . Then, y is never assigned only if none of theevents happens, whose probability is at most (1 − γ ) k a ( y ) − , and further when none of the k a ( y )independent selections picks a , which equals 2 − k a ( y ) . Our analysis will substantiate this intuition.To see the connection with OCS, consider a special case when the bids equal the budgets, i.e., b ai = B a . Then, since x a ( y ) and k a ( y ) are independent of y , it suffices to consider if advertiser a has been assigned. As a result, the above definition coincides with the definition of OCS, takingthe advertisers as the ground elements. The extra challenge of PanOCS is to ensure the inequalitysimultaneously for all points y ∈ [0 , B a ) when the bids are arbitrary. Theorem 5.
Suppose all nonzero bids are large, i.e., B a ≤ b ai ≤ B a or b ai = 0 for any advertiser a ∈ A and any impression i ∈ I . Then, there is a . -PanOCS. Theorem 6.
Suppose the algorithm makes at most k max semi-assignments to any point y ∈ [0 , B a ) of any advertiser a ∈ A . Then, there is a . · k − -PanOCS. The proofs of the theorems are deferred to a separate Section 5. We include below a proof sketchof a weaker -PanOCS for large bids to foreshadow the arguments in our PanOCS analyses.
Proof Sketch of a Weaker Theorem 5 ( γ = ). For any impression i semi-assigned to an advertiser a , we write ( i + j ) a to denote the j -th impression semi-assigned to a after impression i (or the ( − j )-thimpression before i , if j < i in a randomized round. Suppose it issemi-assigned to advertisers a and a . Then, sample a ∗ ∈ { a , a } and j ∈ {− , − , , } uniformlyat random. If j >
0, select a or a and the corresponding subsets with a fresh random bit; further,pass the result to future impression ( i + j ) a ∗ . If j <
0, check if the past impression ( i + j ) a ∗ passesits result to i . If so, makes the opposite choice: select a ∗ and the corresponding subset if a ∗ was not selected in round ( i + j ) a ∗ , and vice versa. Otherwise, select with a fresh random bit.Finally, we argue this is a -PanOCS. Fix any advertiser a and any point y ∈ [0 , B a ). Suppose i < i < · · · < i k are the impressions semi-assigned to a and subsets containing y . Consider anyneighboring i (cid:96) and i (cid:96) +1 . We now use the assumption of large bids to get that i (cid:96) +1 is the first orsecond impression semi-assigned to advertiser a after i (cid:96) . Hence, i (cid:96) may pass its result to i (cid:96) +1 , whichmay then make the opposite choice. When it happens, y is assigned exactly once in rounds i (cid:96) and i (cid:96) +1 . More precisely, this is when i (cid:96) samples a ∗ = a and j > i (cid:96) + j ) a = i (cid:96) +1 , and i (cid:96) +1 samples a ∗ = a and j < i (cid:96) +1 + j ) a = i (cid:96) , which happens with probability . Moreover,we claim that the events are negatively dependent for k − i (cid:96) and i (cid:96) +1 . Hence,the probability of having no such pair is at most (1 − ) k − . Finally, even if i < i < · · · < i k are independent, the probability that a is never selected is only 2 − k . Putting together, y has beenassigned in at least one of these k rounds with probability no less than 1 − − k (1 − ) k − .8 .2 Online Primal Dual Algorithm We demonstrate an online primal dual Algorithm 1, taking a γ -PanOCS as a blackbox. Recall thatan oblivious semi-randomized algorithm either deterministically assigns i to an advertiser-subsetcombination, or semi-assigns it to two combinations in a randomized round. In the latter case, letthe γ -PanOCS select a combination. Let ¯ x a ( y ) be the lower bound of x a ( y ) given by a γ -PanOCSin Eqn. (4), and let ¯ P be the corresponding lower bound of the primal objective in Eqn. (1), i.e.:¯ x a ( y ) def = 1 − − k a ( y ) · (1 − γ ) max { k a ( y ) − , } , ¯ P def = (cid:88) a ∈ A (cid:90) B a ¯ x a ( y ) dy . Let ∆ x denote the increment of ¯ x a ( y ) as k a ( y ) increases, i.e.:∆ x ( k ) def = (cid:40) − k = 1 ;2 − k (1 − γ ) k − (1 + γ ) k ≥ . (5)An online primal dual algorithm’s decision for each impression is driven by maximizing β i . Foreach advertiser a , compute two quantities ∆ Da β i and ∆ Ra β i which we shall detail shortly. The formerdenotes how much β i would gain if i is assigned to a . The latter denotes how much β i would gainif i is semi-assigned to a . The corresponding subsets are decided by the panoramic interval-levelassignment in Section 3. Then, find advertisers a and a with the largest ∆ Ra β i , and advertiser a ∗ with the largest ∆ Da β i . If ∆ Ra β i + ∆ Ra β i is greater than ∆ Da ∗ β i , semi-assign i to a and a in arandomized round. Otherwise, assign i to a ∗ in a deterministic round.Next, we define ∆ Da β i and ∆ Ra β i from two invariants below. First, let the lower bound of primalequal the dual, i.e., ¯ P = D . It ensures reverse weak duality in Lemma 2 because P ≥ ¯ P = D .Second, recall that α a ( y )’s account for dual variable α a at the point-level as explained in Eqn. (2).For a set of parameters ∆ α ( (cid:96) ), (cid:96) ≥
1, which will be optimized in the analysis, let: α a ( y ) = k a ( y ) (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) . (6)We first derive ∆ Ra β i from the invariants. Suppose i is semi-assigned to a and a subset Y ai . Forany point y ∈ Y ai , the primal increment due to y is ∆ x ( k a ( y ) + 1), where k a ( y ) denotes the valuebefore the semi-assignment. The dual increment in α a ( y ) is ∆ α ( k a ( y ) + 1) by the second invariant.Finally, by the first invariant, the increment in β i due to point y ∈ Y ai shall equal the differencebetween ∆ x ( k a ( y ) + 1) and ∆ α ( k a ( y ) + 1). For convenience of notations, define:∆ β ( k ) def = ∆ x ( k ) − ∆ α ( k ) . (7)Our choice of ∆ α ( k ) will ensure non-negativity of ∆ β ( k ). Putting together we get that:∆ Ra β i def = (cid:90) Y ai ∆ β ( k a ( y ) + 1) dy . (8)Similarly, suppose i is assigned deterministically to a and a subset Y ai . For any point y ∈ Y ai ,the primal increment due to y is (cid:80) (cid:96)>k a ( y ) ∆ x ( (cid:96) ) since k a ( y ) becomes ∞ ; the dual increment in α a ( y ) is (cid:80) (cid:96)>k a ( y ) ∆ α ( (cid:96) ) by the second invariant. Thus, together with the first invariant, we let:∆ Da β i def = (cid:90) Y ai (cid:88) (cid:96)>k a ( y ) ∆ β ( (cid:96) ) dy . (9)9 lgorithm 1 Basic Online Primal Dual Algorithm (Parameterized by ∆ α ( k ), k ≥ state variables: k a ( y ) ≥
0, number of times y is semi-assigned; k a ( y ) = ∞ if y is assigned in a deterministic round for all impression i dofor all advertiser a ∈ A do computer subset Y ai ⊆ [0 , B a ) using panoramic interval-level assignment (Section 3)compute ∆ Ra β i and ∆ Da β i according to Equations (7), (8), and (9) end for find a , a that maximize ∆ Ra β i , and a ∗ that maximizes ∆ Da β i if ∆ Ra β i + ∆ Ra β i ≥ ∆ Da ∗ β i assign i to what PanOCS selects between a and a and the corresponding subsets else (i.e., ∆ Ra β i + ∆ Ra β i < ∆ Da ∗ β i ) assign i to a ∗ and the corresponding subset endifend for Recall that reverse weak duality always holds because of the first invariant. Next, we derive a setof conditions on the parameters which imply approximate dual feasibility. These conditions will benumbered. Then, we will optimize the competitive ratio Γ and ∆ α ( k )’s through an LP. For anyadvertiser a and any subset of impressions S ⊆ I , recall approximate dual feasibility in Eqn. (3): (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ b a ( S ) · Γ . First consider a special case when S has only one impression i who bids b ai = B a . This warmupcase is simple enough to be analyzed in around one page, yet is also general enough to derive thebinding conditions on the parameters that are still sufficient in the general case.We will divide into four subcases, depending on whether impression i is assigned to the advertiser a , and whether i is a deterministic or randomized round. In each case, we will lower bound both α a ( y )’s and β i as functions of the k a ( y )’s. To avoid ambiguity, let k a ( y ) be the final value at theend of the algorithm, and k ia ( y ) be the value right before the arrival of impression i . Case 1: Round of i is randomized, and i is not semi-assigned to a . By definition, both a and a chosen by the algorithm contribute at least ∆ Ra β i to β i . Hence, by the definition of ∆ Ra β i in Eqn. (8) and the invariant about α a ( y ) in Eqn. (6), approximate dual feasibility reduces to: (cid:90) B a k a ( y ) (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) dy + 2 (cid:90) B a ∆ β (cid:0) k ia ( y ) + 1 (cid:1) dy ≥ Γ · B a . Since k a ( y ) ≥ k ia ( y ), and the first term is increasing in k a ( y )’s, it suffices to prove the inequalitywhen k a ( y ) equals k ia ( y ). We will ensure the inequality pointwisely for every y ∈ [0 , B a ): ∀ k ≥ k (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + 2 · ∆ β ( k + 1) ≥ Γ . (10)10 ase 2: Round of i is deterministic, and i is not assigned to a . We reduce to the previouscase by introducing a condition about the superiority of randomized rounds. ∀ k ≥ β ( k ) ≥ ∞ (cid:88) (cid:96) = k +1 ∆ β ( (cid:96) ) . (11) Lemma 7.
Assuming Eqn. (11) , for any advertiser a and any impression i , · ∆ Ra β i ≥ ∆ Da β i . We remark that the lemma holds in the general case as well. Adding ∆ β ( k ) to both sidesof Eqn. (11) gives 2 · ∆ β ( k ) ≥ (cid:80) ∞ (cid:96) = k ∆ β ( (cid:96) ). It then follows by the definition of ∆ Ra β i and ∆ Da β i in Equations (8) and (9). Intuitively, it means that a randomized round with two equally goodadvertisers in terms of ∆ Ra β i is better than a deterministic round with only one of them.By the definition of the algorithm, the advertiser a ∗ to which the algorithm deterministicallyassigns i satisfies ∆ Da ∗ β i ≥ ∆ Ra ∗ β i + ∆ Ra β i . Further by 2 · ∆ Ra ∗ β i ≥ ∆ Da ∗ β i because of Lemma 7, wehave ∆ Ra ∗ β i ≥ ∆ Ra β i . Thus, we get β i = ∆ Da ∗ β i ≥ ∆ Ra β i + ∆ Ra ∗ β i ≥ · ∆ Ra β i . The rest is verbatim. Case 3: Round of i is randomized, and i is semi-assigned to a . Since the algorithmchooses not to deterministically assign i to a , we have β i ≥ ∆ Da β i . By the definition of ∆ Da β i inEqn. (9) and the invariant about α a ( y )’s in Eqn. (6), approximate dual feasibility reduces to: (cid:90) B a k a ( y ) (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) dy + (cid:90) B a (cid:88) (cid:96)>k ia ( y ) ∆ β ( (cid:96) ) dy ≥ Γ · B a . Importantly, since i is semi-assigned to a , we have k a ( y ) ≥ k ia ( y ) + 1. By contrast, the previouscases only have k a ( y ) ≥ k ia ( y ). It suffices to ensure the inequality pointwise when k a ( y ) = k ia ( y ) + 1: ∀ k ≥ k (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + ∞ (cid:88) (cid:96) = k ∆ β ( (cid:96) ) ≥ Γ . (12) Case 4: Round of i is deterministic, and i is assigned to a . We have k a ( y ) = ∞ for any y ∈ [0 , B a ) after round i because b ai = B a in the warmup case. Since k a ( y ) may already be verylarge before i arrives, we do not have any nontrivial lower bound of β i . Hence, α a ( y )’s on their ownmust satisfy approximate dual feasibility. By the invariant of α a ( y )’s in Eqn. (6), it reduces to: ∞ (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) ≥ Γ . (13) Optimizing the competitive ratio.
We shall solve an LP, whose variables are the competitiveratio Γ and parameters ∆ α ( k )’s and ∆ β ( k )’s, and whose constraints are the first invariant about∆ α ( k )’s and ∆ β ( k )’s in Eqn. (7) and the sufficient conditions for approximate dual feasibility inEquations (10) to (13). maximize Γsubject to Eqn. (7), (10), (11), (12), and (13)∆ α ( k ) , ∆ β ( k ) ≥ ∀ k ≥ .3.2 General Case We next present a formal proof of approximate dual feasibility in the general case under the invariantin Eqn. (7), and the conditions derived from the warmup, i.e., Equations (10) to (13). To simplifythe argument, we further assume monotonicity of ∆ β : ∀ k ≥ β ( k ) ≥ ∆ β ( k + 1) . (14)We remark that it would be satisfied automatically by the solution of the LP even if it was notstated explicitly. The LP becomes:maximize Γsubject to Eqn. (7), (10), (11), (12), (13), and (14)∆ α ( k ) , ∆ β ( k ) ≥ ∀ k ≥ Lemma 8.
Suppose Γ , ∆ α ( k ) ’s, and ∆ β ( k ) ’s form a solution of the LP in Eqn. (15) . Then,Algorithm 1 satisfies approximate dual feasibility, which we restate below: (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ b a ( S ) · Γ . Recall that reverse weak duality always holds. Lemma 2 leads to the next corollary.
Corollary 9.
Algorithm 1 is Γ -competitive for any solution of the LP in Eqn. (15) .Proof of Lemma 8. The values of α a ( y )’s are determined by the invariant in Eqn. (6). The points y which are deterministically assigned are special because α a ( y ) ≥ Γ by Eqn. (6) and Eqn. (13),i.e., α a ( y ) on its own satisfies approximate dual feasibility locally at the point-level. To refer tothese points in the rest of the argument, define: Y D def = (cid:8) y ∈ [0 , B a ) : y is deterministically assigned by the end of the algorithm (cid:9) . Similar to the warmup in the last subsection, we will lower bound β i differently depending onthe assignment of the impression i . If i is neither assigned nor semi-assigned to a , we will bound β i by 2 · ∆ Ra β i like case 1 and 2 in the warmup, and will resort to Eqn. (10) and Eqn. (11) and thecorresponding Lemma 7. If i is semi-assigned to a , we will bound β i by ∆ Da β i like case 3 in thewarmup, and will resort to Eqn. (12). If i is assigned to a deterministically, we will use the trivialbound of β i ≥ N def = (cid:8) i ∈ S : i is neither assigned nor semi-assigned to a } R def = (cid:8) i ∈ S : i is a randomized round semi-assigned to a (cid:9) The rest of the proof is a charging argument as follows. We will find subsets Y N , Y R ⊆ [0 , B a )and will distribute the contribution from β i ’s for i ∈ N and i ∈ R to the points y ∈ Y N and y ∈ Y R respectively by defining β ( y )’s such that: • Subsets Y N , Y R , and Y D are disjoint. • Subsets Y N , Y R , and Y D have total measure at least b a ( S ), i.e.: µ ( Y N ) + µ ( Y R ) + µ ( Y D ) ≥ b a ( S ) . (16)12 The values of β ( y )’s lower bound the β i ’s, i.e.: (cid:88) i ∈ N β i ≥ (cid:90) Y N β ( y ) dy ; (17) (cid:88) i ∈ R β i ≥ (cid:90) Y R β ( y ) dy . (18) • The values of β ( y )’s satisfy approximate dual feasibility locally at the point-level, i.e.: ∀ y ∈ Y N : α a ( y ) + β ( y ) ≥ Γ ; (19) ∀ y ∈ Y R : α a ( y ) + β ( y ) ≥ Γ . (20)Assuming the above, approximate dual feasibility follows by a sequence of inequalities as follows: (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ (cid:90) B a α a ( y ) dy + (cid:88) i ∈ N β i + (cid:88) i ∈ R β i ( N, R ⊆ S and disjoint) ≥ (cid:90) B a α a ( y ) dy + (cid:90) Y N β ( y ) dy + (cid:90) Y R β ( y ) dy (Eqn. (17), (18)) ≥ (cid:90) Y N ∪ Y R ∪ Y D α a ( y ) dy + (cid:90) Y N β ( y ) dy + (cid:90) Y R β ( y ) dy ( Y N , Y R , Y D ⊆ [0 , B a ))= (cid:90) Y N (cid:0) α a ( y ) + β ( y ) (cid:1) dy + (cid:90) Y R (cid:0) α a ( y ) + β ( y ) (cid:1) dy + (cid:90) Y D α a ( y ) dy ( Y N , Y R , Y D disjoint) ≥ Γ · µ ( Y N ) + Γ · µ ( Y R ) + Γ · µ ( Y D ) (Eqn. (19), (20), (13)) ≥ Γ · b a ( S ) . (Eqn. (16))The rest of the argument substantiates the above plan by constructing subsets Y N , Y R , and thecorresponding β ( y )’s and proving that they satisfy the aforementioned properties. See Figure 2 foran illustration of the construction. Construction of Y N and the Corresponding β ( y ) ’s. Similar to case 1 and 2 in the warmup,we lower bound β i by 2 times ∆ Ra β i . If i is a randomized round, both advertisers a and a towhich i is semi-assigned contribute at least ∆ Ra β i , or else advertiser a should have been choseninstead. If i is a deterministic round, it is the same argument as in the warmup, which we restatebelow for completeness. Since i chooses advertiser a ∗ deterministically instead of randomizingbetween advertisers a ∗ and a , we have β i = ∆ Da ∗ β i ≥ ∆ Ra ∗ β i + ∆ Ra β i . Further by Eqn. (11) andLemma 7, we have ∆ Da ∗ β i ≤ Ra ∗ β i . Cancelling ∆ Ra ∗ β i by combining the two inequalities leads to β i = ∆ Da ∗ β i ≥ Ra β i . Recall that k ia ( y )’s denote the values of the state variables when impression i arrives, and Y ai denotes the subset by the panoramic interval-level assignment, should i be semi-assigned or assigned to advertiser a when it arrives. By the definition of ∆ Ra β i in Eqn. (7): ∀ i ∈ N : β i ≥ (cid:90) Y ai ∆ β (cid:0) k ia ( y ) + 1 (cid:1) dy . (21)13 a) Final status of a (b) Constructions of Y D , Y N , and Y R Figure 2: Illustrative example of the subsets Y D , Y N , and Y R We need to further derive a lower bound w.r.t. the k a ( y )’s at the end of the algorithm. Thiswould be easy if we distribute β i to the points y ∈ Y ai since k a ( y ) is nondecreasing over time. Sucha charging may not work, however, because Y ai may intersect Y D . The next lemma resolves this. Lemma 10.
For any subset ˜ Y ai with measure at most b ai , we have: (cid:90) Y ai ∆ β ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β ( k a ( y ) + 1) dy . Proof.
Since the panoramic interval-level assignment chooses a subset Y ai of measure b ai with theminimum k ia ( y )’s, by the monotonicity of ∆ β ( · ) in Eqn. (14) we have: (cid:90) Y ai ∆ β ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β ( k ia ( y ) + 1) dy . Further observe that k a ( y ) ≥ k ia ( y ) for any y ∈ [0 , B a ). Applying the monotonicity of ∆ β ( · )once again proves the lemma.Before explaining the definition of Y N , let us recall some notations defined earlier. Let y ∗ be thethreshold above which y / ∈ Y D satisfies k a ( y ) = k min a = min z ∈ [0 ,B a ) k a ( z ), and below which y / ∈ Y D satisfies k a ( y ) = k min a + 1 (see panoramic interval-level assignment and Lemma 4 in Section 3). Forany point y ∈ [0 , B a ), any subset Y ⊆ [0 , B a ), and any 0 ≤ b ≤ B a , let y ⊕ Y b denote the pointin [0 , B a ) such that [ y, y ⊕ Y b ) excluding Y has Lebesgue measure b . Further recall our abuse ofnotation which allows y ⊕ Y b to be smaller than y , in which case [ y, y ⊕ Y b ) denotes the union of[ y, B a ) and [0 , y ⊕ Y b ). In the boundary case when b ≥ B a − µ ( Y ), define y ⊕ Y b = y . Define theinverse operator y (cid:9) Y b similarly. We write i (cid:48) < i if i (cid:48) arrives before i .For any i ∈ N , define ˜ Y ai as:˜ Y ai def = (cid:104) y ∗ ⊕ Y D (cid:88) i (cid:48) ∈ N : i (cid:48)
Since the panoramic interval-level assignment chooses a subset Y ai of measure b ai with theminimum k ia ( y )’s, we have: (cid:90) Y ai ∞ (cid:88) (cid:96) = k ia ( y )+1 ∆ β ( (cid:96) ) dy ≥ (cid:90) ˜ Y ai ∞ (cid:88) (cid:96) = k ia ( y )+1 ∆ β ( (cid:96) ) dy . The lemma then follows by the assumption that ˆ k a ( y ) ≥ k ia ( y ) for any y ∈ [0 , B a ).For any i ∈ R , define ˜ Y ai as:˜ Y ai def = (cid:104) y ∗ (cid:9) Y D (cid:88) i (cid:48) ∈ R : i (cid:48) ≥ i b ai (cid:48) , y ∗ (cid:9) Y D (cid:88) i (cid:48) ∈ R : i (cid:48) >i b ai (cid:48) (cid:17) \ Y D . (26)In other words, scan backwards through the interval [0 , B a ) starting from y ∗ and treating it asa circle by gluing its endpoints. Then, construct ˜ Y ai ’s for i ∈ R one at a time by the opposite oftheir arrival order, letting each be a subset excluding Y D with measure up to b ai . If (cid:80) i ∈ N b ai ≤ B a − µ ( Y D ), which we consider to be the canonical case, these would be the panoramic interval-levelassignments if the i ∈ R arrived at the end of the instance, assuming the same final state of thealgorithm. Again, by the definition of the boundary case, we stop scanning through [0 , B a ) after afull circle and, therefore, the above ˜ Y ai ’s are disjoint.Define Y R and the corresponding β ( y ) as: Y R def = (cid:91) i ∈ R ˜ Y ai \ Y N , ∀ y ∈ Y R : β ( y ) def = ∞ (cid:88) (cid:96) = k a ( y ) ∆ β (cid:0) (cid:96) (cid:1) . (27) Proof of Eqn. (18) . For any i ∈ R , define k − ia ( y ) by considering what the state variables ofadvertiser a would have been before the arrival of i if the impressions in R were the latest ones inthe instance . More precisely, for any i ∈ R , let: k − ia ( y ) def = k a ( y ) − y ∈ (cid:2) y ∗ (cid:9) Y D (cid:80) i (cid:48) ∈ R : i (cid:48) ≥ i b ai (cid:48) , y ∗ (cid:1) \ Y D ; k a ( y ) otherwise.Intuitively, these are the largest possible values of k ia ( y )’s, as the next lemma formalizes. Lemma 12.
For any i ∈ R and any y ∈ [0 , B a ) we have: k − ia ( y ) ≥ k ia ( y ) . roof. Suppose for contrary k − ia ( y ) < k ia ( y ) for some impression i ∈ R some point y ∈ [0 , B a ). Bythe definition of k − ia ( y ), this means k ia ( y ) = k a ( y ) < ∞ , and k − ia ( y ) = k a ( y ) −
1. Importantly,by k ia ( y ) = k a ( y ), the panoramic assignment can only assign or semi-assign impressions after andincluding i to points in ( y, y ∗ ), and at most once per point. This is the backbone of the proof.Its first implications is that all impressions i (cid:48) ∈ R after and including i are semi-assigned to disjoint subsets of ( y, y ∗ ): (cid:91) i (cid:48) ∈ R : i (cid:48) ≥ i Y ai (cid:48) ⊂ ( y, y ∗ ) . Its second implication, which may be less obvious, is that points in Y D cannot be semi-assignedsince the arrival of impression i . Consider any point y ∈ Y D . If it has already been deterministicallyassigned by the time impression i arrives, the claim holds trivially. Otherwise, it can be assignedor semi-assigned at most once since the arrival of i . This last opportunity must be used for adeterministic assignment or else y would not be in Y D . Together with the first implication, we getthat all impressions i (cid:48) ∈ R after and including i are semi-assigned to disjoint subsets of ( y, y ∗ ) \ Y D : (cid:91) i (cid:48) ∈ R : i (cid:48) ≥ i Y ai (cid:48) ⊂ ( y, y ∗ ) \ Y D . However, the LHS has total measure (cid:80) i (cid:48) ∈ R : i (cid:48) ≥ i b ai (cid:48) by definition, and the RHS has total measurestrictly less than that due to k − ia ( y ) = k a ( y ) −
1. We have a contradiction.Consider any i ∈ R . By definition, ˜ Y ai is a subset with measure at most b ai . Further, Lemma 12above allows us to apply Lemma 11: (cid:90) Y ai ∞ (cid:88) (cid:96) = k ia ( y )+1 ∆ β ( (cid:96) ) dy ≥ (cid:90) ˜ Y ai ∞ (cid:88) (cid:96) = k − ia ( y )+1 ∆ β ( (cid:96) ) dy . Finally, observe that k − ia ( y ) = k a ( y ) − y ∈ ˜ Y ai . We have: (cid:90) Y ai ∞ (cid:88) (cid:96) = k ia ( y )+1 ∆ β ( (cid:96) ) dy ≥ (cid:90) ˜ Y ai ∞ (cid:88) (cid:96) = k a ( y ) ∆ β ( (cid:96) ) dy . (28)Eqn. (18) then follows by Eqn. (25), the above inequality in Eqn. (28), and the definition of Y R and the correposnding β ( y ) for y ∈ Y R in Eqn. (27), through a sequence of inequalities as follows: (cid:88) i ∈ R β i ≥ (cid:88) i ∈ R (cid:90) Y ai ∞ (cid:88) (cid:96) = k ia ( y )+1 ∆ β ( (cid:96) ) dy (Eqn. (25)) ≥ (cid:88) i ∈ R (cid:90) ˜ Y ai ∞ (cid:88) (cid:96) = k a ( y ) ∆ β ( (cid:96) ) dy (Eqn. (28)) ≥ (cid:90) Y R β ( y ) dy . (Eqn. (27)) Proof of Eqn. (20) . Writing α a ( y ) as (cid:80) k a ( y ) (cid:96) =1 ∆ α ( (cid:96) ) by the invariant in Eqn. (6), and by thedefinition of β ( y ) in Eqn. (27), Eqn. (20) reduces to Eqn. (12), which is a constraint in the LP: α a ( y ) + β ( y ) = k a ( y ) (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + ∞ (cid:88) (cid:96) = k a ( y ) ∆ β ( (cid:96) ) (Eqn. (6) and (27)) ≥ Γ . (Eqn. (12))17 isjointness. This follows directly by the constructions of Y N and Y R . The construction of Y N explicitly rules out points in Y D in Eqn. (22). The construction of Y R explicitly rules out points in Y D in Eqn. (26), and rules out points in Y R in Eqn. (27). Bounding the Total Measure: Proof of Eqn. (16) . Observe that: (cid:91) i ∈ N ˜ Y ai = (cid:104) y ∗ , y ∗ ⊕ Y D (cid:88) i ∈ N b ai (cid:17) \ Y D , (cid:91) i ∈ R ˜ Y ai = (cid:104) y ∗ (cid:9) Y D (cid:88) i ∈ R b ai , y ∗ (cid:17) \ Y D . If (cid:80) i ∈ N b ai + (cid:80) i ∈ R b ai ≥ B a − µ ( Y D ), the union of Y D , Y N , and Y R covers [0 , B a ). Eqn. (16)then holds trivially because the LHS equals B a and the RHS is upper bounded by B a .Otherwise, the definition of Y R simplifies as Y R = ∪ i ∈ R ˜ Y ai , and we have: µ ( Y R ) = (cid:88) i ∈ R b ai , µ ( Y N ) = (cid:88) i ∈ N b ai . Finally, any i ∈ S \ ( N ∪ R ) is deterministically assigned to a and therefore: µ ( Y D ) ≥ (cid:88) i ∈ S \ ( N ∪ R ) b ai . Putting together proves Eqn. (16): µ ( Y N ) + µ ( Y R ) + µ ( Y D ) ≥ (cid:88) i ∈ S b ai ≥ b a ( S ) . It remains to optimize the parameters ∆ α ( k )’s and ∆ β ( k )’s and the competitive ratio Γ by solvingthe LP in Eqn. (15). Observe that the LP has countably infinitely many variables and constraintsand therefore, cannot be directly solved with an LP solver. One possible strategy is to solve a finiterestriction by setting ∆ α ( k ) = ∆ β ( k ) = 0 for k > k max with some sufficiently large k max . This isindeed the strategy we use for the hybrid algorithm in Section 6. Fortunately, the LP in Eqn. (15)admits benign structures. As a result, we can provide an explicit solution. Lemma 13.
For any ≤ γ ≤ , the following is a solution of the LP in Eqn. (15) : Γ = 3 + 2 γ γ . ∆ α ( k ) = γ γ ∆ x (1) k = 1 ;1 + γ γ ∆ x ( k ) k ≥ . ∆ β ( k ) = γ γ ∆ x (1) k = 1 ;12 + γ ∆ x ( k ) k ≥ . (29) Proof.
Below we verify the constraints of the LP, i.e., Eqn. (7), (10), (11), (12), (13), and (14).18 qn. (7) : The invariant ∆ α ( k ) + ∆ β ( k ) = ∆ x ( k ) is guaranteed explicitly by the definition of∆ α ( k )’s and ∆ β ( k )’s above. Eqn. (10) : It holds with equality as shown below. k (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + 2 · ∆ β ( k + 1) = ∆ α (1) + k (cid:88) (cid:96) =2 ∆ α ( (cid:96) ) + 2 · ∆ β ( k + 1)= 3 + γ γ ∆ x (1) + 1 + γ γ k (cid:88) (cid:96) =2 ∆ x ( (cid:96) ) + 22 + γ ∆ x ( k + 1)= 3 + γ
12 + 6 γ + 1 + γ γ (cid:16) − (cid:0) − γ (cid:1) k − (cid:17) + 1 + γ γ (cid:0) − γ (cid:1) − k − = 3 + 2 γ γ . Eqn. (11) : By definition, we have: ∞ (cid:88) (cid:96) = k +1 ∆ β ( (cid:96) ) = 12 + γ ∞ (cid:88) (cid:96) = k +1 ∆ x ( (cid:96) ) . Next we show: ∞ (cid:88) (cid:96) = k +1 ∆ x ( (cid:96) ) ≤ ∆ x ( k ) . It holds with equality when k = 1 because both sides equal . When k ≥
2, it follows by theobservation that ∆ x ( (cid:96) + 1) ≤ ∆ x ( (cid:96) ) for any (cid:96) ≥ ∞ (cid:88) (cid:96) = k +1 ∆ β ( (cid:96) ) ≤
12 + γ ∆ x ( k )The RHS equals ∆ β ( k ) by definition when k ≥
2. For k = 1, this is less than ∆ β ( k ) due to γ γ ≥ γ for any γ ≥
0. In both cases we get Eqn. (11).
Eqn. (12) : First consider k = 1. The constraint holds with strict inequality: k (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + ∞ (cid:88) (cid:96) = k ∆ β ( (cid:96) ) = ∆ α (1) + ∆ β (1) + ∞ (cid:88) (cid:96) =2 ∆ β ( (cid:96) )= ∆ x (1) + 12 + γ ∞ (cid:88) (cid:96) =2 ∆ x ( (cid:96) )= 12 + 14 + 2 γ> γ γ . The last inequality holds for any γ ≤
1. 19ext, consider k ≥
2. The constraint still holds with strict inequality: k (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) + ∞ (cid:88) (cid:96) = k ∆ β ( (cid:96) ) = ∆ α (1) + k − (cid:88) (cid:96) =2 ∆ α ( (cid:96) ) + (cid:0) ∆ α ( k ) + ∆ β ( k ) (cid:1) + ∞ (cid:88) (cid:96) = k +1 ∆ β ( (cid:96) )= 3 + γ γ ∆ x (1) + 1 + γ γ k − (cid:88) (cid:96) =2 ∆ x ( (cid:96) ) + ∆ x ( k ) + 12 + γ ∞ (cid:88) (cid:96) = k +1 ∆ x ( (cid:96) )= 3 + γ
12 + 6 γ + 1 + γ γ (cid:16) − (cid:0) − γ (cid:1) k − (cid:17) + 1 + γ (cid:0) − γ (cid:1) k − + 12 (cid:0) − γ (cid:1) − k − = 3 + 2 γ γ + 14 + 2 γ (cid:0) − γ (cid:1) − k − > γ γ . Eqn. (13) : It holds with equality. In fact, it can be seen as the limit case of Eqn. (10) or Eqn. (12)when k goes to infinity. We include the calculation below for completeness. ∞ (cid:88) (cid:96) =1 ∆ α ( (cid:96) ) = ∆ α (1) + ∞ (cid:88) (cid:96) =2 ∆ α ( (cid:96) )= 3 + γ γ ∆ x (1) + 1 + γ γ ∞ (cid:88) (cid:96) =2 ∆ x ( (cid:96) )= 3 + γ
12 + 6 γ + 1 + γ γ = 3 + 2 γ γ . Eqn. (14) : By ∆ x ( k ) > ∆ x ( k + 1) and γ γ ≥ γ , we have ∆ β ( k ) > ∆ β ( k + 1) from thedefinition of ∆ β ( k )’s. The strict inequality substantiates our earlier remark that the constraintwould be satisfied automatically by the solution of the LP even if it was not stated explicitly. Large Bids: the Crux of AdWords
In light of the positive results by Mehta et al. [36] for small bids that are at most half the budgets,the case of large bids, i.e., B a < b ai ≤ B a , can be viewed as the crux of the AdWords problem.As a direct corollary of Lemma 13 and the 0 . . Theorem 14.
Suppose all nonzero bids are large, i.e., B a < b ai ≤ B a or b ai = 0 for any advertiser a ∈ A and any impression i ∈ I . Then, Algorithm 1 with the γ = 0 . -PanOCS in Theorem 5is Γ -competitive for: Γ = 3 + 2 γ γ > . . eneral Bids: a Weaker Version of Theorem 1 Next, consider a restricted version of Algorithm 1 such that for any advertiser a and any point y ∈ [0 , B a ), y is semi-assigned at most k max times for some positive integer k max . This can beachieved by letting ∆ x ( k ) = ∆ α ( k ) = ∆ β ( k ) = 0 for any k > k max . The restriction allows us to usethe PanOCS for general bids in Theorem 6. We need, however, a solution to the LP in Eqn. (15)under the restriction. A natural choice is adopting the solution in Lemma 13 directly for k ≤ k max ,and decreasing the competitive ratio Γ accordingly to preserve feasibility. Lemma 15.
For any ≤ γ ≤ , the following is a solution of the LP in Eqn. (15) : Γ = 3 + 2 γ γ − − k max (1 − γ ) k max − . ∆ α ( k ) = γ γ · ∆ x (1) k = 1 ;1 + γ γ · ∆ x ( k ) 2 ≤ k ≤ k max ;0 k > k max . ∆ β ( k ) = γ γ · ∆ x (1) k = 1 ;12 + γ · ∆ x ( k ) 2 ≤ k ≤ k max ;0 k > k max . (30) Proof.
It follows by Lemma 13 that the contributions of ∆ α ( k )’s and ∆ β ( k )’s, k > k max , to theapproximate dual feasibility constraints, i.e., Eqn. (10), (12), and (13), are at most: ∞ (cid:88) (cid:96) = k max +1 ∆ x ( (cid:96) ) = 2 − k max (1 − γ ) k max − . Hence, decreasing the competitive ratio Γ by this amount restores approximate dual feasibilityeven after setting ∆ α ( k ) = ∆ β ( k ) = 0 for k > k max .Finally, observe that the other constraints, i.e., Eqn. (7), (11), and (14), are trivially preservedafter letting ∆ x ( k ) = ∆ α ( k ) = ∆ β ( k ) = 0 for k > k max .Consider the γ -PanOCS in Theorem 6 where γ = 0 . · k − . The competitive ratio fromthe above solution is:Γ = 3 + 2 γ γ − − k max (1 − γ ) k max − = 12 + Ω (cid:0) k − (cid:1) − − k max (1 − γ ) k max − . The second term is inverse proportional to k max while the third term decreases exponentiallyin k max . Hence, by choosing a sufficiently large k max , the competitive ratio is strictly larger thanhalf. Indeed, letting k max = 18 gives Γ > . Panoramic Online Correlated Selection
This section details the design and analysis of the PanOCS algorithms used in the last section. Wefirst restate the definition of PanOCS below.
Definition 2.
A PanOCS is an online algorithm which takes a sequence of pairs of advertiser-subset combinations as input, and for each pair selects one combination. It is a γ -PanOCS forsome 0 ≤ γ ≤ a , and any point y ∈ [0 , B a ), we have: x a ( y ) ≥ − − k a ( y ) (1 − γ ) max { k a ( y ) − , } . (4)We start with a warmup algorithm in Subsection 5.1 which gives a simple yet weaker -PanOCSfor large bids, substantiating the proof sketch in the previous section. Then, we explain how toimprove and generalize the warmup algorithm to prove Theorem 5 and Theorem 6 in Subsection 5.2and Subsection 5.3 respectively. -PanOCS for Large Bids This subsection explains the basics of PanOCS algorithms and their analyses through a proof ofthe following theorem.
Theorem 16 (Weaker Version of Theorem 5) . Suppose all nonzero bids are large, i.e., we have B a < b ai ≤ B a or b ai = 0 for any a ∈ A and any i ∈ I . Then, there is a ≈ . -PanOCS. We adopt the concepts and ex-ante and ex-post dependence graphs from the research on OCSby Huang and Tao [19, 20]. To avoid confusion with the vertices and edges in the bipartite graph ofAdWords, we shall refer to the counterparts in the dependence graphs as nodes and arcs respectively.
Ex-ante Dependence Graph.
Let I R denote the set of impressions in randomized rounds. The ex-ante dependence graph D is a directed graph with a node for every impression i ∈ I R ; we abusenotation and refer to the node also as i . Recall that we write i < i (cid:48) if i arrives before i (cid:48) . Definition 3 (Correlation among Randomized Rounds) . Suppose i < i (cid:48) are two impressions semi-assigned to an advertiser a and to subsets Y ai and Y ai (cid:48) respectively.1. They are related w.r.t. advertiser a if the subsets overlap, i.e., if there exists y ∈ Y ai ∩ Y ai (cid:48) .2. If further there is no impression i (cid:48)(cid:48) between them, i.e., i < i (cid:48)(cid:48) < i (cid:48) , which is also semi-assignedto advertiser a and a subset containing y , we say that i < i (cid:48) are adjacent w.r.t. advertiser a .3. Otherwise, we say that i < i (cid:48) are unrelated w.r.t. advertiser a .For large bids, two impressions semi-assigned to the same advertiser a are always related. Wemake the above definition more general so that it applies to arbitrary bids.Let there be an arc ( i, i (cid:48) ) a in the ex-ante dependence graph D if i < i (cid:48) are adjacent w.r.t. a .The subscript helps distinguish parallel arcs, when i and i (cid:48) are semi-assigned to the same pairs ofadvertisers (yet potentially distinct subsets). See Figure 3 for an illustrative example.Further, we sometimes say that i < i (cid:48) are related or adjacent without specifying an advertiser,which means the relation holds for some advertiser a . Similarly, we say that i < i (cid:48) are unrelatedwithout specifying an advertiser, which means they are unrelated w.r.t. any advertiser a .Informally, our PanOCS ensures that for any pair of adjacent nodes i < i (cid:48) , with probability γ the γ -PanOCS correlates the decisions perfectly negatively: it selects advertiser a in round i (cid:48) if it22 ∞0 & !
432 21 … … (a) Semi-assignments to advertiser a (b) Arcs w.r.t. advertiser a Figure 3: Example of ex-ante dependence graphdoes not select a in round i , and vice versa. Further, if two nodes are related, the selections thereinare either independent or negatively correlated. Finally, if two nodes are unrelated, the selectionscould be arbitrarily correlated. In this subsection and the next, the PanOCS algorithms for largebids make pairwise independent decisions for unrelated nodes. The PanOCS for general bids in thelast subsection, however, crucially utilizes the freedom of correlating unrelated nodes positively. Lemma 17.
If all nonzero bids are large, any i ∈ I R has at most out-arcs and at most in-arcs.Proof. Fix any impression i ∈ I R . Let a and a (cid:48) be the advertisers chosen in this randomized round.We will show that there are at most two out-arcs ( i, i (cid:48) ) a w.r.t. advertiser a . Then, by symmetricarguments, there are at most two out-arcs w.r.t. advertiser a (cid:48) , and at most two in-arcs w.r.t. eachof a and a (cid:48) . Putting together proves the lemma.Let i and i be the next two randomized rounds after i which semi-assign to advertiser a . Weclaim that i has no out-arcs to any node other than i and i w.r.t. advertiser a , because when bidsare large every point y ∈ [0 , B a ) is semi-assigned at least once in rounds i and i . Hence, i < i (cid:48) cannot be adjacent for any later impression i (cid:48) (cid:54) = i , i because for any choice of y ∈ [0 , B a ) therealways exists i (cid:48)(cid:48) = i or i such that i < i (cid:48)(cid:48) < i (cid:48) and y ∈ Y ai (cid:48)(cid:48) . Ex-post Dependence Graph.
The PanOCS algorithms in this paper follow the same recipe:construct a random subgraph D ∗ of D ; then, for every arc ( i, i (cid:48) ) a in the subgraph D ∗ , it correlatesthe selections in i and i (cid:48) perfectly negatively. We call D ∗ the ex-post dependence graph. Thesubgraph D ∗ must not introduce positive correlation among related nodes. For instance, we cannotinclude both ( i, i (cid:48) ) a and ( i (cid:48) , i (cid:48)(cid:48) ) a in D ∗ if i and i (cid:48)(cid:48) are related, or else the decisions in i and i (cid:48)(cid:48) will beperfectly positively correlated. We construct D ∗ to be a random matching. Hence, conditioned onany realization of D ∗ , any pair of nodes are either independent or perfectly negatively correlated.Concretely, on the arrival of each node i , we pick an incident arc randomly each with probability ; an arc of D is included in D ∗ if both incident nodes pick it. See Algorithm 2. Even though the out-arcs have yet to reveal themselves, the PanOCS may reference them as the first and secondout-arcs w.r.t each of the two chosen advertisers. lgorithm 2 Panoramic Online Correlated Selection (Large Bids, γ = ) for all impression i ∈ I R do add arcs from i (cid:48) to i to D for every existing i (cid:48) that correlates with i randomly pick one of its at most 8 incident arc in D , each with probability if it picks an in-arc, say, ( i (cid:48) , i ) a , and i (cid:48) also picks the arc then add arc ( i (cid:48) , i ) a to D ∗ select a if it does not select a for i (cid:48) , and select a (cid:48) otherwise (i.e., the opposite selection) else select with a fresh random bit end ifend for Proof of Theorem 16.
Fix any advertiser a and any point y ∈ [0 , B a ). Let i < i < · · · < i k bethe subset of impressions that are semi-assigned to advertiser a and subsets containing y . Supposethere exists an arc in D ∗ between a pair of these nodes. Then, Algorithm 2 picks advertiser a in exactly one of these two rounds. If no such arc exists, Algorithm 2 picks a with probabilityhalf independently in each of the k rounds; the probability that a is never chosen is 2 − k . Hence, itsuffices to show that the probability that there is no arc in D ∗ among i to i k is at most (1 − ) k − .In fact, we will show a slightly stronger result. For any 0 ≤ m ≤ k , let F m denote the eventthat no arc of the form ( i j , i j +1 ) a exists among the first m nodes. Let f m denotes its probability.Observe that when m = k , this is a necessary condition of the event we seek to analyze. Hence, itis sufficient to bound the probability of F k . We claim that f m is recursively defined as follows: f = f = 1 , f m = f m − − f m − . (31)Observe that f m − ≥ f m − because F m − is a subevent of F m − . The desired bound follows by: f m = f m − − f m − ≤ (cid:0) − (cid:1) f m − . The base cases of Eqn. (31) are trivial. It remains to show the recurrence. To do so, we furtherintroduce an auxiliary subevent G m for any 1 ≤ m ≤ k −
1, which not only requires F m , but alsothat the last impression i m picks arc ( i m , i m +1 ) a . Let g m denote the probability of the subevent.For the subevent to happen, the last impression i m must pick a specific arc ( i m , i m +1 ) a , whichhappens with probability . This choice of i m further ensures that arc ( i m − , i m ) cannot be realized.Therefore, the rest of the restriction is simply that no arc ( i j , i j +1 ) a is realized in the first m − g m = 18 f m − . For the original event to happen, on the other hand, there are two cases. The first case is when i m picks arc ( i m − , i m ) a , which happens with probability . In this case the rest of the restrictionrequires not only that no arc ( i j , i j +1 ) a is realized in the first m − i m − cannot pick arc ( i m − , i m ) a . In other word, it corresponds to the event w.r.t. the first m − excluding the subevent . The other case is when i m picks one of the other 7 arcs, whichhappens with probability . In this case it once again reduces to having no arc ( i j , i j +1 ) a realizedin the first m − f m = 18 (cid:0) f m − − g m − (cid:1) + 78 f m − . Importantly, this holds even if the arc was due to an advertiser other than a . lgorithm 3 Panoramic Online Correlated Selection (Large Bids) parameter: ≤ p ≤
1, the probability of being a sender for all impression i ∈ I R do add arc ( i (cid:48) , i ) a to D for any existing i (cid:48) adjacent to i w.r.t. some advertiser a s.t. b ai , b ai (cid:48) > B a i is a sender with probability p :select with a fresh random bitrandomly pick an out-arc in D , each with probability i is a receiver with probability 1 − p : if some sender i (cid:48) picks an in-arc ( i (cid:48) , i ) a of i in D then randomly pick such an in-arc ( i (cid:48) , i ) a and add it to D ∗ select a if it does not select a for i (cid:48) , and vice versa (i.e., the opposite selection) else select with a fresh random bit end ifend for Cancelling g m using the previous equation proves Eqn. (31). We further adopt the nomenclature from the research on OCS [19, 20]. In the warmup PanOCSin the last subsection, whenever a node i picks an out-arc, say, ( i, i (cid:48) ) a , the PanOCS uses a freshrandom bit to select an advertiser-subset combination in the round i ; further, the selection of round i is ready to be received by node i (cid:48) should i (cid:48) also picks arc ( i, i (cid:48) ) a . To this end, we call node i a sender if it picks an out-arc, and call it a receiver otherwise. In the warmup algorithm, a sender’srandom bit is sent to one other node, which is received only if the other node chooses to be areceiver and further the randomly chosen in-arc happens to be the one from the sender .Next, we refine the italic part above to obtain the improved ratio in Theorem 5. We still let eachnode be a sender or a receiver randomly. Further, each sender still sends its selection to a randomout-neighbor. Each receiver, however, proactively checks whether any in-neighbors are senders whopick its corresponding in-arcs. If so, it randomly picks one such in-arc. Finally, we optimize theprobability of letting a node be a sender to obtain the ratio stated in Theorem 5.We further define the algorithm more generally for arbitrary bids, where the correlation occursonly among large ones. The more general definition will be useful in the hybrid algorithm in thenext section. See Algorithm 3.The next lemma analyzes Algorithm 3 for any choice of sender probability 0 < p <
1. Theorem 5,i.e., γ = 0 . p = . Lemma 18.
If all nonzero bids are large, Algorithm 3 is a γ -PanOCS for: γ = 14 (cid:0) − p (cid:1) p (cid:0) − p (cid:1) . (32)Recall the intuition behind a γ -PanOCS: any two adjacent impressions (which correspond totwo neighboring nodes in D ) are perfectly negatively correlated (which correspond to being in D ∗ )with probability γ ; moreover, the events are as least as good as independent. Before diving intoa formal proof of Lemma 18, we explain the intuition why the marginal probability of realizing25n arc ( i (cid:48) , i ) a in D ∗ may be the above value of γ . For an arc to be in D ∗ , i must be a receiver,which happens with probability 1 − p , and must pick arc ( i (cid:48) , i ) a , which happens with probability .Further, i (cid:48) must be a receiver, which happens with probability p . Conditioned on all of the above,what is the chance that ( i (cid:48) , i ) a is in D ∗ ? Each in-neighbor is a sender with probability p , picks thein-arc of i with probability , and finally wins over ( i (cid:48) , i ) a with probability . Hence, the threein-arcs of i other than ( i (cid:48) , i ) a together prevent i from choosing ( i (cid:48) , i ) a with probability at most p .This overestimation of the failure probability gives the stated value of γ .We will in fact prove the following lemma that further applies to the more general case with amixture of large and small bids. Lemma 18 follows as a direct corollary. The more general lemmawill be useful in the hybrid algorithm in the next section. For any advertiser a and any any point y ∈ [0 , B a ), let k La ( y ) ≤ k a ( y ) denote the number of impressions semi-assigned to advertiser a andsubsets containing y whose bids are at least B a , before the first time y is semi-assigned to a smallbid, i.e., smaller than B a . If y has never been semi-assigned to a small bid, let k La ( y ) = k a ( y ). Lemma 19.
For the γ in Eqn. (32) , Algorithm 3 satisfies that for any advertiser a and any point y ∈ [0 , B a ) , y is assigned at least once with probability at least: − − k a ( y ) · (1 − γ ) max { k La ( y ) − , } . Proof.
Fix any advertiser a and any point y ∈ [0 , B a ). Let i < i < · · · < i k a ( y ) be the impressionssemi-assigned to advertiser a and subsets containing y . If there is an arc in the ex-post dependencegraph D ∗ between two of them, point y is assigned in exactly one of the two rounds. Otherwise,each of the k rounds independently has probability half of assigning y ; there is only a 2 − k a ( y ) probability that y is never assigned. Hence, it remains to analyze the former event and show thatthe probability of having no arc among i < i < · · · < i k in the ex-post dependence graph D ∗ isat most (1 − γ ) k La ( y ) − for the stated value of γ in the lemma. We will upper bound this by theprobability of having no arcs among the first k La ( y ) impressions i < i < · · · < i k La ( y ) .Concretely, for any 0 ≤ m ≤ k La ( y ), let F m denote the event that there is no arc in D ∗ among i (cid:96) , 1 ≤ (cid:96) ≤ m . Let f m denote the probability of F m . Trivially we have f = f = 1.Next, we inductively derive the following upper bound of f m for 2 ≤ m ≤ k La ( y ): f m ≤ (1 − γ ) · f m − . To do so, further consider an auxiliary subevent G m of F m for any 1 ≤ m ≤ k La ( y ) −
1, whichfurther requires that i m is a sender who picks arc ( i m , i m +1 ) a . Let g m be the probability of G m . Auxiliary Event.
In order to have event G m , we need:1. Node i m is a sender (probability p );2. Node i m picks ( i m , i m +1 ) a (probability ); and3. Event F m − (probability f m − ).These conditions are independent since they rely on disjoint subsets of random bits. We have: g m = p f m − . (33) Main Event.
Next we turn to event F m . There are two subcases: i m is a sender, or a receiver.26 ase 1: Sender. If i m is a sender, F m cannot fail due to a pair of nodes including i m . Hence, itremains to ensure F m − . The contribution of this case to the probability of F m is: pf m − (34) Case 2: Receiver.
In this case, we need to further ensure that F m does not fail due to a pair ofnodes including i m . There are at most 4 in-arcs of concern. First, there is always an arc ( i m − , i m ) a in D . Further, there may some other arcs, either from i m − to i m w.r.t. advertiser a , or from some i (cid:96) , 1 ≤ (cid:96) ≤ m − n ≤ n = 0, which we shall demonstrate first. Case 2a: n = 0 . The only arc incident to i m of concern is ( i m − , i m ) a . As a result, it is sufficient(but not necessary in general) if the first m − F m − \ G m − . This part contributes:(1 − p ) (cid:0) f m − − g m − (cid:1) . (35)Even if the first m − G m − , there may not be a perfect negative correlationbetween the selections of impressions i m − and i m , due to the competition from the other in-arcsof i m in D . Concretely, there are 3 in-arcs other than ( i m − , i m ) a . Moreover, by the assumption ofthe subcase, these in-arcs are not from i (cid:96) for any 1 ≤ (cid:96) ≤ m − i m withprobability p independently. The probability that i m picks i m − equals: (cid:88) i =0 i + 1 (cid:18) p (cid:19) i (cid:18) − p (cid:19) − i (cid:18) i (cid:19) = (cid:88) i =0 (cid:18) p (cid:19) i (cid:18) − p (cid:19) − i (cid:18) i + 1 (cid:19) = 1 p (cid:18) − (cid:0) − p (cid:1) (cid:19) . Through a similar calculation, we further conclude that if there are two in-arcs from the samenode, the probability that i m picks i m − is 1 − p + p . The bounds in both cases are greaterthan 1 − p for any 0 ≤ p ≤ F m is at most:(1 − p ) (cid:18) − (cid:0) − p (cid:1)(cid:19) g m − . (36)Summing up the contributions in Equations (34), (35), and (36), cancelling g m − using Eqn. (33),and plugging in the value of γ , we have: f m ≤ f m − − γ · f m − ≤ (1 − γ ) f m − . Case 2b: ≤ n ≤ . We will show that the probability of F m in this case is upper bounded bythe previous case. In this case, having the first m − F m − \ G m − is no longer a sufficientcondition since there may still be an arc from i (cid:96) , 1 ≤ (cid:96) ≤ m − i m . Nonetheless, Eqn. (34) isstill an upper bound of the contribution of this part.Next, consider the case when the first m − G m − . Comparing with the previouscase, the main difference is that i m may have in-arcs in D other than ( i m − , i m ) a from some i (cid:96) ,1 ≤ (cid:96) ≤ m −
1. However, should i m pick one of these in-arcs, we still get that y has been selected.Intuitively, it is less likely than the previous case that i m picks an in-arc not from i (cid:96) , 1 ≤ (cid:96) ≤ m − m − G m − . Suppose n (cid:48) ≤ n in-neighbors of i m of the form i (cid:96) , 1 ≤ (cid:96) ≤ m −
2, are senders who pick the arc to i m . If each ofthe 3 − n other in-arcs from from distinct nodes, each of the 3 − n other in-neighbors is a senderwho picks the arc to i m independently with probability p . The probability that i m picks one of thein-arcs from in-neighbors of the form i (cid:96) is: − n (cid:88) i =0 n (cid:48) i + n (cid:48) + 1 (cid:18) p (cid:19) i (cid:18) − p (cid:19) − n − i (cid:18) − ni (cid:19) ≥ − n (cid:88) i =0 i + 1 (cid:18) p (cid:19) i (cid:18) − p (cid:19) − n − i (cid:18) − ni (cid:19) = − n (cid:88) i =0 − n (cid:18) p (cid:19) i (cid:18) − p (cid:19) − n − i (cid:18) − ni + 1 (cid:19) = 4(4 − n ) p (cid:18) − (cid:0) − p (cid:1) − n (cid:19) . Observe that x − (cid:0) − (1 − p ) x (cid:1) is decreasing in x , the above is greater than the bound in theprevious case which corresponds to n = 0.Finally, if there are two of the 3 − n other in-arcs are from the same neighbor, it must be n = 1.This in-neighbor is a sender who picks one of these two arcs with probability p . We have a similarcalculation for the probability that i m picks one of the in-arcs from in-neighbors of the form i (cid:96) : p · n (cid:48) n (cid:48) + (cid:0) − p (cid:1) ≥ p ·
12 + (cid:0) − p (cid:1) = 1 − p , which is also greater than the 1 − p bound needed in the analysis. Challenge.
In the presence of both large and small bids, a semi-assignment of an impression i to an advertiser-subset combination ( a, Y ai ) may be adjacent to an arbitrary number of subsequentsemi-assignments of smaller bids. For instance, consider an impression with a large bid b ai = B a followed by n impressions with small bids b ai (cid:48) = B a n . Therefore, the previous approach of lettingeach impression randomly picks an out-arc in the ex-ante dependence graph D no longer worksbecause the probability that an arc ( i, i (cid:48) ) a in the ex-ante dependence graph D is included in the ex-post dependence graph D ∗ may be arbitrarily small. Solution.
For any advertiser a , we will partition the impressions semi-assigned to a with smallbids into groups. Then, define group-level correlation similar to the case of large bids, treatingthe union of impressions within the same group as a single large bid. We will argue that eachgroup is correlated with a bounded number of other groups. Finally, recall that any impression in arandomized round is associated with two advertisers and thus, belongs to two groups, one for eachadvertiser. Pick one of the two groups randomly and follow its decision.It is worth remarking that the impressions in the same group are positively correlated. Hence,two impressions in the same group, say, w.r.t. an advertiser a must be unrelated. In other words,other than the common advertiser a , the two impressions are semi-assigned to either two distinctadvertisers, or the same advertiser but disjoint subsets.The rest of the subsection will substantiate the above intuition with a formal definition of thealgorithm and its analysis. We will build on the notions of two nodes’ being related, adjacent, andunrelated (Definition 3). 28 irst-level Partition. For each advertiser a , let I Ra denote the set of randomized round whichsemi-assign the impressions to a . We shall greedily partition I Ra into subsets of impressions thatare pairwise unrelated w.r.t. advertiser a , denoted as I ja , j ≥ j a = 1 and I ja = ∅ for any j ≥ i ∈ I Ra :(a) Let j a ← j a + 1 if impression i is adjacent to the first impression in I j a a w.r.t. a .(b) Let I j a a ← I j a a ∪ { i } .Next we establish several properties of the above greedy partition. By the above definition andthe panoramic interval-level assignment in Section 3, we have: Lemma 20.
Any two impressions in the same subset I ja are unrelated w.r.t. advertiser a . The next property shows that the subsets restore the main structural property of large bids,i.e., Lemma 17. Indeed, if all bids were large, each impression would be a subset on its own.
Lemma 21.
For any neighboring subsets I ja and I j +1 a , and any point y ∈ [0 , B a ) which is not deterministically assigned, y is semi-assigned at least once in the rounds in I ja and I j +1 a .Proof. In fact, we will show a stronger claim that the rounds in subset I ja and the first round in I j +1 a suffice. By the definition of the greedy partition algorithm, the subset of [0 , B a ) chosen in thefirst round in I j +1 a intersects with the subset of the first round in I ja ; otherwise, it would have beenadded to I ja instead. Hence, the points in this intersection have already been semi-assigned twice.Finally, by the panoramic interval-level assignment in Section 3, any point y ∈ [0 , B a ) that has notbeen deterministically assigned thus far must have been semi-assigned at least once.This lemma has two direct corollaries. Corollary 22.
For any advertiser a there are at most k max nonempty subsets I ja . Corollary 23.
Suppose two impressions belong to subsets I ja and I ka respectively, where k > j + 2 .Then, they are not adjacent w.r.t. a . Second-level Partition.
An pair of impressions in the same subset I ja could still be related oreven adjacent w.r.t. an advertiser other than a . To resolve this we further introduce another layerof partition of each I ja into ∪ k I j,ka as follows: ∀ k ∈ Z + : I j,ka = I ja ∩ (cid:18) (cid:91) a (cid:48) (cid:54) = a I ka (cid:48) (cid:19) . (37)In other words, an impression i is in I j,ka if it belongs to the j -th subset I ja w.r.t. advertiser a in the first-level partition, and further belongs to the k -th subset I ka (cid:48) of the other advertiser a (cid:48) towhich i is semi-assigned. We shall refer to each I j,ka as a group of impressions.As a corollary of Lemma 20, we have: Corollary 24.
Any two impressions in the same group I j,ka are unrelated. roup-level Decision. Fix any advertiser a ∈ A . We say that two groups I j,ka and I j (cid:48) ,k (cid:48) a areadjacent if there exist impressions i ∈ I j,ka and i (cid:48) ∈ I j (cid:48) ,k (cid:48) a such that i and i (cid:48) are adjacent w.r.t. a .Then, as a further corollary of Lemma 20, Corollary 22, and Corollary 23, we have: Corollary 25.
Any group I j,ka is adjacent to at most k max other groups I j (cid:48) ,k (cid:48) a , whose superscriptsare j (cid:48) ∈ { j − , j − , j + 1 , j + 2 } and ≤ k (cid:48) ≤ k max . The group-level (negative) correlation is achieved using the following algorithm similar to thePanOCS for large bids, treating each group as a large bid. For each group I j,ka , it returns either a or ¬ a with 50-50 marginal probability.Concretely, define an ex-ante dependence graph D a for each advertiser a . Let there be a node foreach group I j,ka . Further, let there be an arc from I j,ka to I j (cid:48) ,k (cid:48) a if they are adjacent, and j < j (cid:48) . Thesecond condition indicates that arcs are from earlier groups to later ones. The algorithm constructsan ex-post dependence graph D ∗ a similar to the PanOCS for large bids. It is parameterized by0 < p <
1, the probability of letting each group be a sender. For each group:1. With probability p , let it be a sender :(a) Pick a subsequent adjacent group I j (cid:48) ,k (cid:48) a , j (cid:48) ∈ { j + 1 , j + 2 } , 1 ≤ k (cid:48) ≤ k max , randomly.(b) Return a or ¬ a uniformly at random with a fresh random bit.2. Otherwise, let it be a receiver :(a) If there exists a previous adjacent group I j (cid:48) ,k (cid:48) a which is a sender and picks I j,ka , makesthe opposite decision, i.e., return a if group I j (cid:48) ,k (cid:48) a returns ¬ a , and vice versa.(b) Otherwise, return a or ¬ a uniformly at random with a fresh random bit. Impression-level Decision.
Recall that the impression of each randomized round is associatedwith two advertisers a and a (cid:48) and thus, two corresponding groups I j,ka and I j (cid:48) ,k (cid:48) a (cid:48) . Follow the decisionof one of the groups, chosen uniformly at random. By following the decision of a group, say, I j,ka ,the PanOCS picks a if the group picks a , and picks a (cid:48) if the group picks ¬ a .The PanOCS for general bids is summarized in Algorithm 4. We next show a general analysisof the algorithm for any value of 0 < p < Lemma 26.
Algorithm 4 is a γ -PanOCS, where: γ = 116 k max (cid:0) − p (cid:1) (cid:18) − (cid:0) − p k max (cid:1) k max (cid:19) . Then, Theorem 6 follows as a corollary, observing the stated value of γ is at least:116 k max (1 − p )(1 − e − p ) , and letting p = 2 − W ( e ) ≈ . W ( · ) is the product logarithm. Proof.
Fix any advertiser a and any point y ∈ [0 , B a ). Let i < i < · · · < i k be the impressionssemi-assigned to advertiser a and subsets containing y . Let j < j < · · · < j k be the subsets suchthat i (cid:96) ∈ I j (cid:96) a . Recall that each subset I j (cid:96) a is further partitioned into groups and the impression i (cid:96) lgorithm 4 Panoramic Online Correlated Selection (General Bids, Parameter 0 < p < j a = 1, I ja = ∅ , and I j,ka = ∅ for any a ∈ A , any j ≥
1, and any 1 ≤ k ≤ k max . for all group I j,ka do with probability p , let it be a sender :let its decision be a or ¬ a with a fresh random bitrandomly pick an I j (cid:48) ,k (cid:48) a , j (cid:48) ∈ { j + 1 , j + 2 } , 1 ≤ k (cid:48) ≤ k max , as the potential receiverotherwise, i.e., with probability 1 − p , let it be a receiver :randomly pick a sender I j (cid:48) ,k (cid:48) a , j (cid:48) ∈ { j − , j − } , 1 ≤ k (cid:48) ≤ k max , who picks I j,ka let I j,ka ’s decision be a if I j (cid:48) ,k (cid:48) a ’s decision is ¬ a , and vise versaif no such I j (cid:48) ,k (cid:48) a exists, let its decision be a or ¬ a with a fresh random bit end forfor all impression i ∈ I R , say, semi-assigned to ( a, S ) and ( a (cid:48) , S (cid:48) ) do let j a ← j a + 1 if i is adjacent to the first impression in I j a a w.r.t. a let j a (cid:48) ← j a (cid:48) + 1 if i is adjacent to the first impression in I j a (cid:48) a (cid:48) w.r.t. a (cid:48) let I j a a ← I j a a ∪ { i } , and I j a (cid:48) a (cid:48) ← I j a (cid:48) a (cid:48) ∪ { i } let I j a ,j a (cid:48) a ← I j a ,j a (cid:48) a ∪ { i } and I j a (cid:48) ,j a a (cid:48) ← I j a (cid:48) ,j a a (cid:48) ∪ { i } follow either I j a ,j a (cid:48) a or I j a (cid:48) ,j a a (cid:48) ’s decision, each with probability half end for belongs to exactly one group. Nonetheless, the index in the second-level partition is unimportantfor the argument; we write the group as I j (cid:96) , ∗ a .Suppose there exists an arc in the ex-post dependence graph D ∗ a between two of I j (cid:96) , ∗ a , 1 ≤ (cid:96) ≤ k ,and further the PanOCS chooses to follows I j (cid:96) , ∗ a ’s decision in these two rounds. Then, point y is assigned in exactly one of the two rounds. Otherwise, each of the k rounds independently hasprobability half of assigning y . Hence, it remains to analyze the former event and upper bound theprobability that it does not happen by (1 − γ ) k − , for the stated value of γ in the lemma.Concretely, for any 0 ≤ m ≤ k , let F m denote the event that for any two nodes I j (cid:96) , ∗ a , 1 ≤ (cid:96) ≤ m ,either there is no arc in D ∗ a between them, or the PanOCS does not follow their decisions in atleast one of the two rounds. Let f m denote the probability of F m . Trivially we have f = f = 1.Next, we inductively derive the following upper bounds of f m for 2 ≤ m ≤ k : f m ≤ (1 − γ ) · f m − . To do so, further consider an auxiliary subevent G m of F m for any 1 ≤ m ≤ k −
1, which furtherrequires that I j m , ∗ a is a sender who picks I j m +1 , ∗ a and the PanOCS follows I j m , ∗ a ’s decision. Let g m be the probability of G m . Auxiliary Event.
In order to have event G m , we need:1. The PanOCS follows I j m , ∗ a ’s decision (probability );31. Node I j m , ∗ a is a sender (probability p );3. Node I j m , ∗ a picks I j m +1 , ∗ a (probability k max ); and4. Event F m − (probability f m − ).Further observe that these conditions are independent since they rely on disjoint subsets ofrandom bits. We have: g m = p k max f m − . (38) Main Event.
Next we turn to event F m . There are two cases depending on the conditions below,which are independent because they rely on disjoint subsets of random bits.1. The PanOCS follows I j m , ∗ a ’s decision (probability );2. Node I j m , ∗ a is a receiver (probability 1 − p ). Case 1: Last Node Does Not Matter.
If at least one of the above conditions do not hold, F m cannot fail due to a pair of nodes including I j m , ∗ a . Hence, it remains to ensure F m − . Thecontribution of this case to the probability of F m is: (cid:0) − − p (cid:1) f m − (39) Case 2: Last Node Matters.
If both conditions hold, we need to further ensure that F m doesnot fail due to a pair of nodes including I j m , ∗ a . There are two pairs of nodes of concern. First,there is always an arc from I j m − , ∗ a to I j m , ∗ a in D a . We need either that the arc is not realized in D ∗ a , or that the PanOCS does not follows I j m − , ∗ a ’s decision. Here, recall the assumption that thePanOCS does follow I j m , ∗ a ’s decision. The second pair of nodes is I j m − , ∗ a and I j m , ∗ a . There mayor may not be an arc between them in the ex-ante dependence graph D a , depending on whether j m = j m − + 2. This further divides the rest of the analysis into two subcases Case 2a.
The first subcase is when j m > j m − + 2, which is also the bottleneck of the analysis.Then, there cannot be an arc from I j m − , ∗ a to I j m , ∗ a in the dependence graphs. The only arc ofconcern is from I j m − , ∗ a to I j m , ∗ a . As an immediate implication, it is sufficient (but not necessary ingeneral) if the first m − F m − \ G m − . The contribution of this part is:1 − p (cid:0) f m − − g m − (cid:1) . (40)Even if the first m − G m − , there may not be a perfect negative correlationbetween the decisions of impressions i m − and i m , due to the competition from other in-neighborsof I j m , ∗ a in D a . Concretely, there are 4 k max − I j m − , ∗ a . Further, by theassumption of the subcase, these in-neighbors are not I j (cid:96) a for any 1 ≤ (cid:96) ≤ m − p k max of being a sender who picks I j m , ∗ a . The probability that32 j m , ∗ a picks I j m − , ∗ a instead of one of these competitors is equal to: k max − (cid:88) i =0 i + 1 (cid:18) p k max (cid:19) i (cid:18) − p k max (cid:19) k max − − i (cid:18) k max − i (cid:19) = k max − (cid:88) i =0 k max (cid:18) p k max (cid:19) i (cid:18) − p k max (cid:19) k max − − i (cid:18) k max i + 1 (cid:19) = 1 p (cid:18) − (cid:0) − p k max (cid:1) k max (cid:19) . Hence, the contribution of this part to the probability of F m is:1 − p (cid:18) − p (cid:18) − (cid:0) − p k max (cid:1) k max (cid:19)(cid:19) g m − . (41)Summing up the contributions in Equations (39), (40), and (41), cancelling g m − using Eqn. (38),and plugging in the value of γ , we have: f m = f m − − γ · f m − ≤ (1 − γ ) f m − . Case 2b.
The second subcase is when j m = j m − + 2. We will show that the probability of F m is upper bounded by the previous case. In this case, having the first m − F m − \ G m − is no longer a sufficient condition since there may still be an arc I j m − , ∗ a to I j m , ∗ a . Nonetheless,Eqn. (40) is still an upper bound of the contribution of this part.Next, consider the case when the first m − G m − . Comparing with the previouscase, the main difference is that I j m − , ∗ a is also an in-neighbor of I j m , ∗ a in D a . However, I j m − , ∗ a serve as an competitor only when the PanOCS does not follow its decision; otherwise, having anarc from I j m − , ∗ a to I j m , ∗ a in the ex-post dependence graph D ∗ a also precludes event G m − .Next, we argue that Eqn. (41) continues to serve as an upper bound of the contribution fromthis part. Formally, let H denote the event that I j m − , ∗ a is a sender who picks I j m , ∗ a , and furtherthe PanOCS does not follow I j m − , ∗ a ’s decision. We show that conditioned on G m − , event H holdswith probability at most p k max . Observe the intersection of events G m − and H is equivalent tothe following collection of independent conditions:1. The PanOCS follows I j m − , ∗ a ’s decision (probability );2. Node I j m − , ∗ a is a sender (probability p );3. Node I j m − , ∗ a picks I j m , ∗ a (probability k max );4. The PanOCS does not follows I j m − , ∗ a ’s decision (probability );5. Node I j m − , ∗ a is a sender (probability p );6. Node I j m − , ∗ a picks I j m , ∗ a (probability k max ); and7. Event F m − (probability f m − ). 33utting together, the joint event happens with probability: p k f m − . Then, the conditional probability bound is: Pr (cid:2) H | G m − (cid:3) = p k f m − g m − (Bayes’s rule)= p k max f m − f m − (Eqn. (38)) ≤ p k max . The last inequality is due to the observation that having event F m − and having PanOCS notfollow I j m − , ∗ a ’s decision is sufficient for F m − . This section gives a 0 . The hybrid algorithm is also an oblivious semi-randomized algorithm following the online primaldual framework. When an impression i arrives, each advertiser a makes two offers ∆ Ra β i and ∆ Da β i .They would be the increments of β i if i is semi-assigned or assigned to advertiser a respectively,determined by a specific dual update rule to be explained shortly. Then, the impression i eitherpicks two offers of the first kind, or one offer of the second kind, whichever maximizes β i .Recall that we call the first case a randomized round, and say that the impression is semi-assigned to the two advertisers and the corresponding subsets given by the panoramic interval-levelassignments. In this case, we select one of the advertiser-subset combinations using the PanOCSfor large bids in Algorithm 3, which has been presented in a generalized form that accepts bothlarge and small bids but handle them differently. In particular, it introduces negative correlationonly among large bids.Further recall that we call the second case a deterministic round, and say that the impression isassigned to the advertiser and the corresponding subset by the panoramic interval-level assignments.In sum, the hybrid algorithm, defined in Algorithm 5, is almost identical to the basic algorithmat the high-level. However, it uses a PanOCS that handles large and small bids differently, whichin turn leads to a different dual update rule and different definitions of the offers ∆ Ra β i and ∆ Da β i .The next subsections detail these differences. 34 lgorithm 5 Hybrid Algorithm (cid:0)
Parameterized by ∆ α RL ( · ), ∆ α RR ( · ), ∆ α DL ( · ), and ∆ α DR ( · ) (cid:1) state variables: k a ( y ) ≥
0, number of times y is semi-assigned; k a ( y ) = ∞ if y is assigned in a deterministic round for all impression i ∈ I dofor all advertiser a ∈ A do computer subset Y ai ⊆ [0 , B a ) using panoramic interval-level assignment (Seciton 3)compute ∆ Ra β i and ∆ Da β i according to Equations (44) and (45) end for find a , a that maximize ∆ Ra β i , and a ∗ that maximizes ∆ Da β i if ∆ Ra β i + ∆ Ra β i ≥ ∆ Da ∗ β i assign i to what PanOCS (large bids) selects between a and a and the correspondingsubsets else (i.e., ∆ Ra β i + ∆ Ra β i < ∆ Da ∗ β i ) assign i to a ∗ and the corresponding subset endifend for6.1.2 Primal Increments We first introduce a lower bound of the primal objective. In the rest of the section, let γ = 0 . k La ( y )denote the number of times y is semi-assigned by large bids before it is semi-assigned by a smallbid. For any advertiser a ∈ A and any point y ∈ [0 , B a ), define:¯ x a ( y ) def = − − k a ( y ) y ∈ [0 , B a ) ;1 − − k a ( y ) (1 − γ ) k La ( y ) − y ∈ [ B a , B a ) , k a ( y ) (cid:54) = 1 or k La ( y ) (cid:54) = 0 ; − γ y ∈ [ B a , B a ) , k a ( y ) = 1 and k La ( y ) = 0 . By Lemma 19, for any advertiser a ∈ A and any point y ∈ [0 , B a ): x a ( y ) ≥ ¯ x a ( y )The different definitions for the first and second halves of the interval [0 , B a ) differently ismotivated by the online primal dual analysis of the small-bid algorithm of Mehta et al. [36] (seeAppendix A), which also handles the two halves differently. The lower bound of the second half isexactly the guarantee given by Lemma 19, apart from a special case when k a ( y ) = 1 and k La ( y ) = 0.The special case corresponds to when the first semi-assignment to y is a small bid. In this case, wepenalize the small bid in order to reserve sufficient primal increments for future semi-assignmentsthat are potentially large bids. The analysis below will substantiate this intuition. In the first half,however, we give up the correlation given by the PanOCS. We remark that it is not an inferiorchoice and can be seen as banking up primal increments for the future.Accordingly, define a surrogate primal objective which lower bounds the actual primal:¯ P def = (cid:88) a ∈ A (cid:90) ∞ ¯ x a ( y ) dy . Hence, it would suffice to show competitiveness of the algorithm w.r.t. the surrogate objective.35 rimal-increment Constants.
We continue to introduce some constants for the increment in¯ x a ( y ) as a point y gets further assignments and semi-assignments, depending on whether theyare from large or small bids. In the following discussions, subscripts L and R represent if y isin the left half or the right half of the interval [0 , B a ). Superscripts specify the nature of theassignments. In particular, D and R stand for deterministic assignments and randomized semi-assignments respectively. For points in the second half of the interval, a further superscript L or S indicates whether the assignments are from large or small bids.We start with the simpler left half of the interval, which follows from the definition of ¯ x a ( y ). Lemma 27.
For any advertiser a ∈ A and any point y ∈ [0 , B a ) : • The k -th semi-assignment to advertiser a and point y increases ¯ x a ( y ) by: ∆ x RL ( k ) def = 2 − k . • A deterministic assignment to advertiser a and point y after k − semi-assignmentsincreases ¯ x a ( y ) by: ∆ x DL ( k ) def = 2 − k +1 . As for the second half of the interval, the increments depend on the natural of the previous semi-assignments. Nonetheless, we shall define history-free constants by taking the smallest incrementover all possibilities.
Lemma 28.
For any advertiser a ∈ A and any y ∈ [ B a , B a ) : • The k -th semi-assignment to advertiser a and point y , if it is a small bid , increases ¯ x a ( y ) by at least: ∆ x RSR ( k ) def = − γ k = 1 ;2 − k (1 − γ ) k − k ≥ . • The k -th semi-assignment to advertiser a and point y , if it is a large bid , increases ¯ x a ( y ) by at least: ∆ x RLR ( k ) def = k = 1 ;2 − k (1 − γ ) k − (1 + γ ) k ≥ . • A deterministic assignment to advertiser a and point y after k − semi-assignmentsincreases ¯ x a ( y ) by at least: ∆ x DL ( k ) def = 2 − k +1 (1 − γ ) k − . Proof.
Recall the definition of ¯ x a ( y ). When y ∈ [ B a , B a ), it falls into the second or the third case,which we restate below. If k a ( y ) (cid:54) = 1 or k La ( y ) (cid:54) = 0, we call it the regular case:¯ x a ( y ) = 1 − − k a ( y ) (1 − γ ) k La ( y ) − . (42)If k a ( y ) = 1 and k La ( y ) = 0, we call it special case:¯ x a ( y ) = 12 − γ . (43)36 Semi-assignment, small bid)
The case of k = 1 follows by the definition in the special case,i.e., Eqn. (43). For k ≥
2, the expression of ¯ x a ( y ) follows the regular case in Eqn. (42), both beforeand after this semi-assignment, and k La ( y ) ≤ k − − k (1 − γ ) k La ( y ) − , The minimum is achieved when k La ( y ) = k −
1, and equals the ∆ x RSR ( k ) defined in the lemma. (Semi-assignment, large bid) First consider the binding case when all semi-assignments to y are large bids. Then, k a ( y ) = k La ( y ) holds throughout, in particular, before and after the semi-assignment at hand. Hence, if follows from the definition of ¯ x a ( y ) that its increment equals the∆ x RLR ( k ) defined in the lemma.Next, suppose there was some previous semi-assignment to y that is a small bid. If k = 2,it corresponds to the case the first semi-assignment is a small bid and the second one is a largebid. In other words, ¯ x a ( y ) before the large bid at hand equals − γ due to the special case in itsdefinition. After the semi-assignment of the large bid, ¯ x a ( y ) = by definition. The increment istherefore γ , which equals the ∆ x RLR ( k ) for k = 2 defined in the lemma. This part crucially usesthat ¯ x a ( y ) is smaller than in the special case, reserving part of the gain for the second round.Finally, suppose k ≥
3. The expression of ¯ x a ( y ) follows the regular case in Eqn. (42), bothbefore and after this semi-assignment, and k La ( y ) stays the same. Hence, the increment equals:2 − k (1 − γ ) k La ( y ) − . Further observe that we have k La ( y ) ≤ k − k − k − − k (1 − γ ) k − . This is strictly greater than the ∆ x RLR ( k ) defined in the lemma because 1 > (1 − γ )(1 + γ ). (Deterministic assignment) Observe that k La ( y ) ≤ k a ( y ) = k − x a ( y ), the increment equals:2 − k +1 (1 − γ ) k La ( y ) − , It is minimized when k La ( y ) = k −
1, and equals the ∆ x DR ( k ) defined in the lemma. The dual update rule is driven by two invariants. The first one is easy to state:
Invariant 1.
Dual increment equals the primal increment defined in the previous subsection.Next we explain another invariant that determines the value of α a ( y ) based on the status ofadvertiser a . It is similar to Eqn. (6) in the basic algorithm yet more complicated. To begin with,its counterpart in the basic algorithm uses only one group of parameters, while here it needs four: • ∆ α RL ( k ): Increment in α a ( y ) when y ∈ [0 , B a ) is semi-assigned for the k -th time. • ∆ α RR ( k ): Increment in α a ( y ) when y ∈ [ B a , B a ) is semi-assigned for the k -th time. • ∆ α DL ( k ): Increment in α a ( y ) when y ∈ [0 , B a ) is deterministically assigned if there are k − ∆ α DR ( k ): Increment in α a ( y ) when y ∈ [ B a , B a ) is deterministically assigned if there are k − α a ( y ) does not depend on whether the bid is small or large. This is intentional so thatthe subsequent of arguments for approximate dual feasibility is history-free. It also means thatthe costs of having small bids, which means smaller primal increments, are completely charged tothe β variables. This intuitively indicates that the algorithm is less likely to choose small bids inrandomized rounds compared to the large ones, all other things being equal.Further, we handle the left and right halves of the interval [0 , B a ) differently. Again, this ismotivated by the analysis of the small-bid algorithm, and is aligned with our definition of the primalincrements in the previous subsection.Finally, we introduce two additional groups of parameters to model the increments in α a ( y ) dueto deterministic assignments. By contrast, the basic algorithm uses an ad-hoc choice of ∆ α D ( k ) = (cid:80) ∞ (cid:96) = k ∆ α R ( (cid:96) ). We bring in the additional parameters because a constraint concerning the gain of β variables in deterministic assignments, i.e., Eqn. (12), is nonbinding in the analysis of the basicalgorithm. In other words, we could have let the β variables get less and let the α variables getmore in deterministic assignments. Although the slack is inconsequential in the basic algorithm,the extra gain is crucial for the hybrid algorithm and its analysis.Let k Ra ( y ) be the number of semi-assignments to y so that we retain this information even if y has been eventually deterministically assigned. Observe that k Ra ( y ) = k a ( y ) if k a ( y ) < ∞ . Invariant 2.
For any advertiser a ∈ A and any point y ∈ [0 , B a ): α a ( y ) = k Ra ( y ) (cid:88) (cid:96) =1 ∆ α R ( y, (cid:96) ) k a ( y ) (cid:54) = ∞ k Ra ( y ) (cid:88) (cid:96) =1 ∆ α R ( y, (cid:96) ) + ∆ α D ( k Ra ( y ) + 1) k a ( y ) = ∞ . α Variables
We further introduce notations ∆ α R ( y, k ) and ∆ α D ( y, k ) as follows so that subsequent integralsare more succinct. ∆ α R ( y, k ) def = ∆ α RL ( k ) y ∈ [0 , B a ) ;∆ α RR ( k ) y ∈ [ B a , B a ) . ∆ α D ( y, k ) def = ∆ α DL ( k ) y ∈ [0 , B a ) ;∆ α DR ( k ) y ∈ [ B a , B a ) . By the invariant regarding the α variables and the accounting by the point-level in Eqn. (2),the dual increment of α a when an impression is semi-assigned or assigned to advertiser a and thecorresponding subset Y ai are ( k a ( y )’s are the values before the assignment):∆ Ri α a = (cid:90) Y ai ∆ α R ( y, k a ( y ) + 1) dy, ∆ Di α a = (cid:90) Y ai ∆ α D ( y, k a ( y ) + 1) dy. .1.5 Dual Increments: β Variables
By the first invariant, the definition of primal increments, and the definition of dual increments interms of the α variables, the increments of β variables by the point-level have been pinned down: • The k -th semi-assignment, y ∈ [0 , B a ):∆ β RL ( k ) def = ∆ x RL ( k ) − ∆ α RL ( k ) . • The k -th semi-assignment, small bid, y ∈ [ B a , B a ):∆ β RSR ( k ) def = ∆ x RSR ( k ) − ∆ α RR ( k ) . • The k -th semi-assignment, large bid, y ∈ [ B a , B a ):∆ β RLR ( k ) def = ∆ x RLR ( k ) − ∆ α RR ( k ) . • Deterministic assignment after k − y ∈ [0 , B a ):∆ β DL ( k ) def = ∆ x DL ( k ) − ∆ α DL ( k ) . • Deterministic assignment after k − y ∈ [ B a , B a ):∆ β DR ( k ) def = ∆ x DR ( k ) − ∆ α DR ( k ) . The next lemma shows that the β increments for small bids are smaller than those for largebids. This is because the primal increments for small bids are smaller, while the increments in α variables are the same for both small and large bids by definition. Lemma 29.
For any k ≥ , we have: ∆ β RLL ( k ) ≥ ∆ β RSL ( k ) , ∆ β RLR ( k ) ≥ ∆ β RSR ( k ) . Again, we further introduce notations β RS ( y, k ), β RL ( y, k ), and β D ( y, k ) in order to makeintegrals succinct in subsequent arguments.∆ β RL ( y, k ) def = ∆ β RL ( k ) y ∈ [0 , B a ) ;∆ β RLR ( k ) y ∈ [ B a , B a ) . ∆ β RS ( y, k ) def = ∆ β RL ( k ) y ∈ [0 , B a ) ;∆ β RSR ( j ) y ∈ [ B a , B a ) . ∆ β D ( y, k ) def = ∆ β DL ( k ) y ∈ [0 , B a ) ;∆ β DR ( j ) y ∈ [ B a , B a ) . Therefore, the increments of β i when i is semi-assigned or assigned to a , i.e., the offers ∆ Ra β i and ∆ Da β i in the hybrid algorithm, are defined as:∆ Ra β i def = (cid:90) Y ai ∆ β RL ( y, k a ( y ) + 1) dy large bid, i.e., B a ≤ b ai ≤ B a ; (cid:90) Y ai ∆ β RS ( y, k a ( y ) + 1) dy small bid, i.e., 0 ≤ b ai < B a . (44)∆ Da β i def = (cid:90) Y ai ∆ β D ( y, k a ( y ) + 1) dy . (45)39 .1.6 Regularity Constraints Finally, we introduce several sets of regularity constraints on the parameters. The first two corre-spond to the monotonicity assumption, i.e., Eqn. (14), and the superiority of randomized round,i.e., Eqn. (11), in the basic algorithm. The last one assumes nonnegativity of the dual variables.From now on, we shall label the constraints that will appear in the LP for optimizing the parametersand the competitive ratio by (C1), (C2), (C3), and so on.
Monotonicity.
The first set of constraints state that ∆ β RL ( y, k ), ∆ β RS ( y, k ) and ∆ β D ( y, k ) arenondecreasing w.r.t. y + k · B a . That is, they are larger for a smaller k , and conditioned on thesame k they are larger for a smaller y ’s (i.e., a point y on the left half of the interval). ∀ k ≥ β RLL ( k ) ≥ ∆ β RLR ( k ) , ∆ β RLR ( k ) ≥ ∆ β RLL ( k + 1) , (C1) ∀ k ≥ β RSL ( k ) ≥ ∆ β RSR ( k ) , ∆ β RSR ( k ) ≥ ∆ β RSL ( k + 1) , (C2) ∀ k ≥ β DL ( k ) ≥ ∆ β DR ( k ) , ∆ β DR ( k ) ≥ ∆ β DL ( k + 1) . (C3)We state below two useful consequences of the monotonicity. Lemma 30.
The panoramic interval-level assignment chooses a subset Y ai that maximizes the offers ∆ Ra β i in Eqn. (44) and ∆ Da β i in Eqn. (45) . Lemma 31.
The offers ∆ Ra β i in Eqn. (44) and ∆ Da β i in Eqn. (45) as functions of impression i ’sbid b ai are concave. Superiority of Randomized Rounds.
The second set of constraints implies that the algorithmprefers randomizing over two equally good advertisers with semi-assignments, over a deterministicassignment to only one of them. ∀ k ≥ · ∆ β RLL ( k ) ≥ ∆ β DL ( k ) , · ∆ β RSL ( k ) ≥ ∆ β DL ( k ) ; ∀ k ≥ · ∆ β RLR ( k ) ≥ ∆ β DR ( k ) , · ∆ β RSR ( k ) ≥ ∆ β DR ( k ) . (C4)Recall that the parameters satisfy that ∆ β RLL ( k ) ≥ ∆ β RSL ( k ) and ∆ β RLR ( k ) ≥ ∆ β RSR ( k ). In thissense, listing only the inequalities for small bids would have sufficed. Nevertheless, we opt to makeit more explicit above.As a direct corollary: Lemma 32.
For any advertiser a ∈ A and any impression i ∈ I : ∆ Da β i ≤ · ∆ Ra β i . Nonnegativity.
We shall choose the parameters ∆ α RL ( · ) , ∆ α DL ( · ) , ∆ α RR ( · ) , ∆ α DR ( · ) to be nonneg-ative. Further, we shall ensure that the corresponding β parameters are nonnegative. ∀ k ≥ α RL ( k ) ≥ , ∆ α RR ( k ) ≥ , ∆ α DL ( k ) ≥ , ∆ α DR ( k ) ≥ ∀ k ≥ β RL ( k ) ≥ , ∆ β RSR ( k ) ≥ , ∆ β RLR ( k ) ≥ , ∆ β DL ( k ) ≥ , ∆ β DR ( k ) ≥ . (C6) Lemma 33.
For any advertiser a ∈ A and any impression i , α a ≥ and β i ≥ . .2 Online Primal Dual Analysis In this section, we let the competitive ratio Γ also be a parameter to be optimized together withthe other ones in the analysis. Next, we derive a set of sufficient conditions for proving that thehybrid algorithm is Γ-competitive.
Reverse Weak Duality.
This holds for any choice of the parameters by the design of the primaldual algorithm. In particular, we ensure the invariant that dual increment equals the lower boundof the surrogate primal increment given in Lemma 27 and Lemma 28. Therefore, we have ¯ P ≥ D .Recall that the surrogate primal objective lower bounds the actual one, i.e., P ≥ ¯ P , we get reverseweak duality. Approximate Dual Feasibility.
Fix any advertiser a , and an impression set S . We restateapproximate dual feasibility below, where the contribution of α a is accounted by the point-level: (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ Γ · b a ( S ) . . (46)Next, we will explain how to lower bound the α and β variables respectively, and will show acharging scheme that distributes the lower bounds of the β variables to the points y ∈ [0 , B a ). Theseare similar to their counterparts in the basic algorithm. Unlike the basic algorithm, however, herethe lower bound distributed to each point may not be at least Γ on its own. We will demonstratehow to prove Eqn. (46) by constructing an appropriate measure preserving mapping between pointsin the left half and those in the right half of the interval [0 , B a ), such that the lower bound chargedto each pair is at least 2Γ. α a By the definition of the α -invariant, the gain from α a ( y ) depends on k a ( y ) and k Ra ( y ). To simplifythe case, we impose further constraints below: ∀ k ≥ α RL ( k ) ≤ ∆ α DL ( k ) − α DL ( k + 1) , (C7) ∀ k ≥ α RR ( k ) ≤ ∆ α DR ( k ) − α DR ( k + 1) . (C8)They imply that for any point y with k a ( y ) = ∞ , the larger k Ra ( · ) is, the small α a ( y ) is bythe α -invariant. How large could k Ra ( · ) be? For this, recall the property of k a ( · ) from Lemma 4.It states that other than the subsets that have been deterministically assigned, the value of k a ( y )equals ˜ k a ( y ) defined as: ˜ k a ( y ) def = (cid:40) k min + 1 y < y ∗ ; k min y ≥ y ∗ . Here, recall that k min = min z ∈ [0 ,B a ) k a ( z ), and y ∗ denote the start point of the next subset inthe panoramic interval-level assignment.Then, the largest possible value of k Ra ( y ) is ˜ k a ( y ) − Y D denote the subset of of points in[0 , B a ) that are deterministically assigned: Y D def = (cid:8) y ∈ [0 , B a ) : y is deterministically assigned by the end of the algorithm (cid:9) . We have the following lower bound of the α variables.41 emma 34. For any advertiser a ∈ A and any point y ∈ [0 , B a ) , we have: α a ( y ) ≥ ˜ k a ( y ) (cid:88) (cid:96) =1 ∆ α R ( y, (cid:96) ) y / ∈ Y D ; ˜ k a ( y ) − (cid:88) (cid:96) =1 ∆ α R ( y, (cid:96) ) + ∆ α D ( y, ˜ k a ( y )) y ∈ Y D . In fact, the former case always holds with equality, although this observation is unimportantfor our analysis. β Variables and Charging to Points
We now turn to the lower bound of β i for impressions i ∈ S . Similar to the analysis of the basicalgorithm in Section 4, it depends on the matching status of impression i , in particular, whether i is semi-assigned or assigned to advertiser a and whether it is a deterministic or randomized round.Recall the definitions of sets N and R from the analysis of the basic algorithm: N def = (cid:8) i ∈ S : i is neither assigned nor semi-assigned to a (cid:9) ,R def = (cid:8) i ∈ S : i is semi-assigned to a (cid:9) . Further define the sum of the bids in these subsets as b N and b R respectively for future reference: b N def = (cid:88) i ∈ N b ai , b R def = (cid:88) i ∈ R b ai . We will find two subset Y N , Y R ⊆ [0 , B a ), and will distribute the lower bounds of β i ’s, i ∈ N and i ∈ R , to the points y ∈ Y N and y ∈ Y R respectively. More precisely, we shall define β ( y )’ssuch that: • Y N , Y R , and Y D are disjoint. • Y N , Y R , and Y D have total measure at least b a ( S ), i.e.: µ ( Y N ) + µ ( Y R ) + µ ( Y D ) ≥ b a ( S ) . (47) • The β ( y )’s lower bound the β i ’s, i.e.: (cid:88) i ∈ N β i ≥ (cid:90) Y N β ( y ) dy ; (48) (cid:88) i ∈ R β i ≥ (cid:90) Y R β ( y ) dy . (49) Construction of Y N and the Corresponding β ( y ) ’s. Consider the impressions i ∈ N . Thereare two different cases depending on whether i is a deterministic or randomized round. We claimthat in both cases: β i ≥ · ∆ Ra β i . i chooses advertiser a ∗ deterministically instead ofrandomizing between advertisers a ∗ and a , β i = ∆ Da ∗ β i ≥ ∆ Ra ∗ β i + ∆ Ra β i . Further by Lemma 32,∆ Da ∗ β i ≤ Ra ∗ β i . Cancelling ∆ Ra ∗ β i by combining the two inequalities leads to β i = ∆ Da ∗ β i ≥ Ra β i .Suppose it is a randomized round. By definition, both candidates in this round offer at least∆ Ra β i , or else the algorithm would have chosen advertiser a instead. Hence, β i ≥ Ra β i .Next, express ∆ Ra β i in terms of the state variables using the definition of ∆ Ra β i in Eqn. (44).Recall that k ia ( y )’s denote the values of the state variables when impression i arrives, and Y ai denotes the subset by the panoramic interval-level assignment, should i be semi-assigned or assignedto advertiser a when it arrives. If b ai is a small: β i ≥ · (cid:90) Y ai ∆ β RS ( k ia ( y ) + 1) dy . (50)If b ai is large: β i ≥ · (cid:90) Y ai ∆ β RL ( k ia ( y ) + 1) dy . (51)We need to further derive a lower bound w.r.t. the state variables k a ( y )’s at the end of thealgorithm. Its proof is almost verbatim to its counterpart in the basic algorithm, i.e., Lemma 35.We include it for completeness. Lemma 35.
For any subset ˜ Y ai with measure at most b ai , we have: (cid:90) Y ai ∆ β RS ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β RS ( k a ( y ) + 1) dy , (cid:90) Y ai ∆ β RL ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β RL ( k a ( y ) + 1) dy . Proof.
Since the panoramic interval-level assignment chooses a subset Y ai of measure b ai with theminimum k ia ( y )’s, i.e., Lemma 30, by the monotonicity of ∆ β RS ( · ) and ∆ β RL ( · ) in Eqn. (C1) andEqn. (C2), we have: (cid:90) Y ai ∆ β RS ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β RS ( k ia ( y ) + 1) dy , (cid:90) Y ai ∆ β RL ( k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β RL ( k ia ( y ) + 1) dy . Further observe that k a ( y ) ≥ k ia ( y ) for any y ∈ [0 , B a ). Applying the monotonicity of ∆ β RS ( · )and ∆ β RL ( · ) once again proves the lemma.Next, define Y N as: Y N def = (cid:2) y ∗ , y ∗ ⊕ Y D b N (cid:1) . (52)Here, recall that [ y, y ⊕ Y b ) denotes a subset of [0 , B a ) with measure b , obtained by scanningthrough the interval starting from y and exluding the points in Y .The values of β ( y )’s are determined by two possible lower bounds Ψ NL and Ψ NS for (cid:80) i ∈ N β i ,which we explain below. The first one corresponds to when there is a single large impression i ∈ N with bid b ai = b N . The second one corresponds to when there are one or two small impressions i ∈ N whose bids sum to b N , and in case of two impressions, one of them has the largest possiblesmall bid, i.e., B a . 43ormally, define the first lower bound as:Ψ NL def = 2 · (cid:90) Y N ∆ β RL ( y, k a ( y ) + 1) dy . To define the second lower bound, let b N = min { B a / , B N } and further define: Y N = (cid:2) y ∗ , y ∗ ⊕ Y D b N (cid:1) \ Y D ,Y N = (cid:2) y ∗ ⊕ Y D b N , y ∗ ⊕ Y D b N (cid:1) \ Y D . Observe that Y N = Y N \ Y N by definition. The second lower bound is then defined as:Ψ NS def = 2 (cid:90) Y N ∆ β RS (cid:0) y, k a ( y ) + 1 (cid:1) dy + 2 (cid:90) Y N ∆ β RS (cid:0) y (cid:9) Y D B a , k a ( y (cid:9) Y D B a ) + 1 (cid:1) dy . An explanation for the design of y in the second integral is due. This part is relevant only when b N > B a . That is, we consider two small bids: the first is B and the second is b N − B a . Here,mapping y to y (cid:9) Y D B a is a measure preserving map from Y N to the first b N − B a measure of Y N .By doing so, we explicitly use the fact that the second small bid’s β i can be lower bounded usinga subset starting from y ∗ in Lemma 35, even though the lower bound is eventually charged to Y N which does not start from y ∗ .By the monotonicity of parameters ∆ β RSL ( · ) and ∆ β RSR ( · ), i.e., Eqn. (C2), this is strictly betterthan the trivial bound without the mapping of y , which we used in the analysis of the basicalgorithm. This is the main power of small bids, which pays for the penalties they hadin the dual update rule.
The values of β ( y )’s are set according to the smaller of the two lower bounds: β ( y ) def = · ∆ β RL ( y, k a ( y ) + 1) Ψ NL < Ψ NS ;2 · ∆ β RS ( y, k a ( y ) + 1) Ψ NL ≥ Ψ NS , y ∈ Y N ;2 · ∆ β RS (cid:0) y (cid:9) Y D B a , k a ( y (cid:9) Y D B a ) + 1 (cid:1) Ψ NL ≥ Ψ NS , y ∈ Y N . (53)Finally, we prove that the smaller of the two indeed lower bounds (cid:80) i ∈ N β i . Proof of Eqn. (48) . We restate the inequality below: (cid:88) i ∈ N β i ≥ (cid:90) Y N β ( y ) = 2 · min (cid:8) Ψ NL , Ψ NS (cid:9) . Let Φ NS ( b ), 0 ≤ b ≤ B a , denote the offer from advertiser a for a small bid b at the final stateof the algorithm . Observe that it would be semi-assigned to [ y ∗ , y ∗ ⊕ Y D b ) \ Y D . We have:Φ NS ( b ) def = (cid:90) [ y ∗ ,y ∗ ⊕ YD b ) \ Y D ∆ β RS (cid:0) y, k a ( y ) + 1 (cid:1) dy . Similarly, Φ NL ( b ), B a < b ≤ B a , denote the offer from advertiser a for a large bid b at the finalstate of the algorithm : Φ NL ( b ) def = (cid:90) [ y ∗ ,y ∗ ⊕ YD b ) \ Y D ∆ β RL (cid:0) y, k a ( y ) + 1 (cid:1) dy . i ∈ N , letting ˜ Y ai = [ y ∗ , y ∗ ⊕ Y D b ai ) in Lemma 35, we have: (cid:90) Y ai ∆ β RS ( y, k ia ( y ) + 1) dy ≥ Φ NS ( b ai ) , (cid:90) Y ai ∆ β RL ( y, k ia ( y ) + 1) dy ≥ Φ NL ( b ai ) . (54)Recall that the LHS of the above inequalities are half the lower bound of β i for small and largebids respectively, due to Eqn. (50) and Eqn. (51). In other words, we lower bound each β i by whatadvertiser a would have offered if impression i arrived at the end.The rest of the proof further transforms the sum of these lower bounds for β i for i ∈ N intothe stated bound in the lemma. First we may asssume wlog that b N ≤ B a − µ ( Y D ). Otherwise, wecould have decrease the bid of some impressions in N such that the LHS of Eqn. (48) decreases,while the RHS stays the same.Further, it is wlog to merge small bids into at most two; in the case of two small bids, it is wlogthat the larger one has size B a . This is because Φ NS ( · ) is concave by By Lemma 31. Formally, forany two small bids b ≥ b (cid:48) and any δ , concavity implies:Φ NS ( b ) + Φ NS ( b (cid:48) ) ≥ Φ NS ( b + δ ) + Φ NS ( b (cid:48) − δ ) . Then, we may let δ = b (cid:48) if b + b (cid:48) ≤ B a , and let δ = B a − b otherwise. Repeating this operationproves the claim.Finally, we claim that it is wlog to assume having either only small bids, or a single large bid.Observe that there can be at most one large bid by defintion. In the presence of both large andsmall bids, and after the aforementioned merging of small bids, it must be the case that we haveone large bid, say b > B a , and one small bid b (cid:48) ≤ B a .Since both Φ NS ( · ) and Φ NL ( · ) are concave by Lemma 31, we either have:Φ NL ( b ) + Φ NS ( b (cid:48) ) ≥ Φ NL ( b + δ ) + Φ NS ( b (cid:48) − δ ) , for any 0 ≤ δ ≤ b (cid:48) , or: Φ NL ( b ) + Φ NS ( b (cid:48) ) ≥ Φ NL ( b − δ ) + Φ NS ( b (cid:48) + δ ) , for any 0 ≤ δ ≤ b − B a . The range of δ in the second case is chosen such that the large bid doesnot become small. Observe that the small bid can not become large without letting the large bidbecome small since they sum to at most B a .In the former case, we let δ = b (cid:48) to eliminate the small impression.In the latter case, we let δ = b − B a . Then, the claim follows by the observation that conditionedon having the same size B a , downgrading the large bid into a small bid leads to a smaller offer andthus, a smaller lower bound for the corresponding β i . The observation follows by the definition of β increment in Eqn. (53), and the comparison of β increments for large and small bids in Lemma 29. Reading Guide.
The remaining parts of the subsection, including the construction of Y R and thecorreponsding β ( y )’s, the disjointness of Y N , Y R , and Y D , and their measure bounds, are almostverbatim to the counterparts in the basic algorithm. We include them below for completeness.Nonetheless, readers may want to skip to the next subsection.45 onstruction of Y R and the Corresponding β ( y ) ’s. Since the algorithm does not choosematching to advertiser a deterministically, β i ≥ ∆ Da β i . By the definition of ∆ Da β i in Eqn. (45): β i ≥ (cid:90) Y R ∆ β D ( y, k ia ( y )) . (55)We need to further derive a lower bound w.r.t. the k a ( y )’s at the end of the algorithm. Thenext lemma is similar to Lemma 35 in the previous case, but more generally considers arbitraryˆ k a ( y ) ≥ k ia ( y ) instead of only the k a ( y ) at the end of the algorithm. Lemma 36.
Consider any ˆ k a ( y ) ’s such that ˆ k a ( y ) ≥ k ia ( y ) for any y ∈ [0 , B a ) . Then, for anysubset ˜ Y ai with measure at most b ai : (cid:90) Y ai ∆ β D ( y, k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β D ( y, ˆ k a ( y ) + 1) dy . Proof.
Since the panoramic interval-level assignment chooses a subset Y ai of measure b ai with theminimum and left most k ia ( y )’s, combining with the monotonicity of ∆ β D in Eqn. (C3): (cid:90) Y ai ∆ β D ( y, k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β D ( y, k ia ( y ) + 1) dy . The lemma then follows by the assumption that ˆ k a ( y ) ≥ k ia ( y ) for any y ∈ [0 , B a ), and by themonotonicity of ∆ β D in Eqn. (C3).For any i ∈ R , define ˜ Y ai as:˜ Y ai = (cid:104) y ∗ (cid:9) Y D (cid:88) i (cid:48) ∈ R : i (cid:48) ≥ i b ai (cid:48) , y ∗ (cid:9) Y D (cid:88) i (cid:48) ∈ R : i (cid:48) >i b ai (cid:48) (cid:17) \ Y D . (56)In other words, we scan backwards through the interval [0 , B a ) starting from y ∗ , treating theinterval as a circle by gluing its endpoints. Then, we construct ˜ Y ai ’s for i ∈ R one at a time bytheir arrival order from latest to earliest, letting each be a subset excluding Y D with measure up to b ai . If (cid:80) i ∈ N b ai ≤ B a − µ ( Y D ), which we consider the canonical case, these would be the panoramicinterval-level assignments if these i ∈ R arrived at the end of the instance, assuming the same finalstate of the algorithm. By the definition of the boundary case, we stop scanning through [0 , B a )after a full circle; therefore, the above ˜ Y ai ’s are disjoint.Define Y R and the corresponding β ( y ) as: Y R def = (cid:91) i ∈ R ˜ Y ai \ Y N , ∀ y ∈ Y R : β ( y ) def = ∆ β D ( y, k a ( y )) . (57)We remark that Y R can be simplified as Y R = [ y ∗ (cid:9) Y D b R , y ∗ ) \ Y D if b R + b N + µ ( Y D ) ≤ B a ,which we consider the canonical case of the analysis. Proof of Eqn. (49) . We restate the inequality below: (cid:88) i ∈ R β i ≥ (cid:90) Y R β ( y ) dy = (cid:90) Y R ∆ β D ( y, k a ( y )) dy . i ∈ R , define k − ia ( y ) by considering what the state variables of advertiser a would havebeen before the arrival of i if the impressions in R were the latest ones in the instance . Moreprecisely, for any i ∈ R , let: k − ia ( y ) def = (cid:40) k a ( y ) − y ∈ (cid:2) y ∗ (cid:9) Y D (cid:80) i (cid:48) ∈ R : i (cid:48) ≥ i b ai (cid:48) , y ∗ (cid:1) \ Y D ; k a ( y ) otherwise.Intuitively, these are the largest possible values of k ia ( y )’s. We restate Lemma 12 below, whichstill holds in the hybrid algorithm with a verbatim proof. Lemma 37.
For any i ∈ R and any y ∈ [0 , B a ) : k − ia ( y ) ≥ k ia ( y ) . Consider any i ∈ R . By definition, ˜ Y ai is a subset with measure at most b ai . Further, Lemma 37above allows us to apply Lemma 36, which by the monotonicity of ∆ β D ( · ) in Eqn. (C3) gives: (cid:90) Y ai ∆ β D ( y, k ia ( y ) + 1) ≥ (cid:90) ˜ Y ai ∆ β D ( y, k − ia ( y ) + 1) dy . Finally, by k − ia ( y ) = k a ( y ) − y ∈ ˜ Y ai : (cid:90) Y ai ∆ β D ( y, k ia ( y ) + 1) dy ≥ (cid:90) ˜ Y ai ∆ β D ( y, k a ( y )) dy . (58)Eqn. (49) then follows by Eqn. (55), the above inequality in Eqn. (58), and the definition of Y R and the correposnding β ( y ) for y ∈ Y R in Eqn. (57), through a sequence of inequalities as follows: (cid:88) i ∈ R β i ≥ (cid:88) i ∈ R (cid:90) Y ai ∆ β D ( y, k ia ( y ) + 1) dy (Eqn. (55)) ≥ (cid:88) i ∈ R (cid:90) ˜ Y ai ∆ β D ( y, k a ( y )) dy (Eqn. (58)) ≥ (cid:90) Y R β ( y ) dy . (Eqn. (57)) Disjointness.
The sets Y N and Y R can be written as: Y N = (cid:2) y ∗ , y ∗ ⊕ Y D b N (cid:1) \ Y D , Y R = (cid:2) y ∗ (cid:9) Y D b R , y ∗ (cid:1) \ ( Y D ∪ Y N ) . Hence, they are disjoint by definition.
Measure Bound. If b N + b R + µ ( Y D ) > B a , the union of Y D , (cid:2) y ∗ , y ∗ ⊕ Y D b N (cid:1) , and (cid:2) y ∗ (cid:9) Y D b R , y ∗ (cid:1) covers [0 , B a ). Further by the above equivalent forms of Y N and Y R , the union of Y N , Y R , and Y D also covers [0 , B a ). Then, the measure bound follows by: µ ( Y N ) + µ ( Y R ) + µ ( Y D ) ≥ B a ≥ b a ( S ) . Otherwise, Y R simplifies as (cid:2) y ∗ (cid:9) Y D b R , y ∗ (cid:1) \ Y D . We have: µ ( Y N ) = b N = (cid:88) i ∈ R b ai , µ ( Y R ) = b R = (cid:88) i ∈ R b ai . i ∈ S that is not in N or R must be deterministically assigned. Hence: µ ( Y D ) ≥ (cid:88) i ∈ S \ ( N ∪ R ) b ai . Together we have: µ ( Y R ) + µ ( Y N ) + µ ( Y D ) ≥ (cid:88) i ∈ S b ai ≥ b a ( S ) . Let us summarize the lower bound from α a ( y ) and β ( y ) by the point-level below. • If y ∈ Y N (large bid subcase), by Lemma 34 and Eqn. (53), α a ( y ) + β ( y ) is at least: ψ NLL (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RL ( (cid:96) ) + 2 · ∆ β RLL (cid:0) ˜ k a ( y ) + 1 (cid:1) ≤ y < B a ; ψ NLR (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RR ( (cid:96) ) + 2 · ∆ β RLR (cid:0) ˜ k a ( y ) + 1 (cid:1) B a ≤ y < B a . • If y ∈ Y N and further y ∈ Y N (small bid subcase, first small bid), by Lemma 34 and Eqn. (53), α a ( y ) + β ( y ) is at least: ψ NS L (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RL ( (cid:96) ) + 2 · ∆ β RSL (cid:0) ˜ k a ( y ) + 1 (cid:1) ≤ y < B a ; ψ NS R (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RR ( (cid:96) ) + 2 · ∆ β RSR (cid:0) ˜ k a ( y ) + 1 (cid:1) B a ≤ y < B a . • If y ∈ Y N and further y ∈ Y N (small bid subcase, second small bid), by Lemma 34 andEqn. (53), α a ( y ) + β ( y ) is at least: (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RL ( (cid:96) ) + 2 · ∆ β RS (cid:0) y (cid:9) Y D B a , ˜ k a ( y (cid:9) Y D B a ) + 1 (cid:1) ≤ y < B a ; (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RR ( (cid:96) ) + 2 · ∆ β RS (cid:0) y (cid:9) Y D B a , ˜ k a ( y (cid:9) Y D B a ) + 1 (cid:1) B a ≤ y < B a . This case is more involved since it is unclear whether point y (cid:9) Y D B a is on the left or theright. If point y is on the left half, i.e., y < B a , both y ∗ and y (cid:9) Y D B a are on the right of y .Hence, ˜ k a ( y (cid:9) Y D B a ) ≤ ˜ k a ( y ) −
1. By the monotonicity of ∆ β RS ( y, k ): α a ( y ) + β ( y ) ≥ ψ NS L (cid:0) ˜ k a ( y ) (cid:1) def = ˜ k a ( y ) (cid:88) (cid:96) =1 ∆ α RL ( (cid:96) ) + 2 · ∆ β RSR (cid:0) ˜ k a ( y ) (cid:1) ≤ y < B a . Next suppose point y is on the right half, i.e., B a ≤ y < B a . If point y (cid:9) Y D B a is on the lefthalf, ˜ k a ( y (cid:9) Y D B a ) ≤ ˜ k a ( y ) and β ( y ) ≥ β RSL (˜ k a ( y ) + 1). Otherwise, i.e., point y (cid:9) Y D B a ison the right, ˜ k a ( y (cid:9) Y D B a ) ≤ ˜ k a ( y ) −
1, and β ( y ) ≥ β RSR (˜ k a ( y )), which is even larger thanthe previous case by the monotonicity of ∆ β RS ( y, k ). In sum: α a ( y ) + β ( y ) ≥ ψ NS R (cid:0) ˜ k a ( y ) (cid:1) def = ˜ k a ( y ) (cid:88) (cid:96) =1 ∆ α RR ( (cid:96) ) + 2 · ∆ β RSL (cid:0) ˜ k a ( y ) + 1 (cid:1) B a ≤ y < B a . If y ∈ Y R , by Lemma 34 and Eqn. (57), α a ( y ) + β ( y ) is at least: ψ RL (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RL ( (cid:96) ) + ∆ β DL (˜ k a ( y )) 0 ≤ y < B a ; ψ RR (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) (cid:96) =1 ∆ α RR ( (cid:96) ) + ∆ β DR (˜ k a ( y )) B a ≤ y < B a . • If y ∈ Y D , by Lemma 34 and define β ( y ) = 0, α a ( y ) + β ( y ) is at least: ψ DL (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) − (cid:96) =1 ∆ α RL ( (cid:96) ) + ∆ α DL (cid:0) ˜ k a ( y ) (cid:1) ≤ y < B a ; ψ DR (cid:0) ˜ k a ( y ) (cid:1) def = (cid:80) ˜ k a ( y ) − (cid:96) =1 ∆ α RR ( (cid:96) ) + ∆ α DR (cid:0) ˜ k a ( y ) (cid:1) B a ≤ y < B a . Approximate Dual Feasibility Fails Locally.
In the previous analysis of the basic algorithm,approximate dual feasibility holds locally in that α a ( y ) + β ( y ) ≥ Γ for any y ∈ Y N ∪ Y R ∪ Y D . Ifwe follow the same strategy, the above functions from ψ NLL ( k ) to ψ DR ( k ) need to be at least Γ forall possible values of k = ˜ k a ( y ). This is impossible, however, for any nontrivial competitive ratioΓ > .
5. This shall not be surprising since it goes against the strategy of handling the left andright halves of the interval [0 , B a ) differently in the hybrid algorithm. New Plan: Pairing Points Between Left and Right.
Based on the above discussion, anamortization between left and right is needed. Concretely, we will design a measure preservingmap h : [0 , B a ) (cid:55)→ [ B a , B a ) such that: ∀ y ∈ (cid:104) , B a (cid:17) , α a ( y ) + β ( y ) + α a ( h ( y )) + β ( h ( y )) ≥ . (59)This would be sufficient for approximate dual feasibility if the union of Y N , Y R , and Y D covers[0 , B a ). To make the above pairing idea works in the general case, we further impose the followingconstraint on the increments in β variables:∆ β RSL (1) ≤ Γ2 . (C9) Lemma 38.
Eqn. (59) implies approximate dual feasibility.Proof.
First consider the case when the union of Y N , Y R , and Y D covers [0 , B a ). (cid:90) B a α a ( y ) dy + (cid:88) i ∈ S β i ≥ (cid:90) B a α a ( y ) dy + (cid:90) Y N ∪ Y N ∪ Y D β ( y ) dy (Eqn. (48) and (49))= (cid:90) B a (cid:0) α a ( y ) + β ( y ) (cid:1) dy = (cid:90) Ba (cid:0) α a ( y ) + β ( y ) + α a ( h ( y )) + β ( h ( y )) (cid:1) dy ( h is measure preserving) ≥ (cid:90) Ba dy = Γ · B a . (Eqn. (59))Next, consider an instance in which this does not hold, i.e.: (cid:88) i ∈ S b ai ≤ µ ( Y N ∪ Y R ∪ Y D ) < B a . S a set of impressions that are small bids w.r.t. advertiser a to S summingto B a − (cid:80) i ∈ S b ai . Then, the RHS of approximate dual feasibility increases by Γ · ( B a − (cid:80) i ∈ S b ai ),while the LHS increases by at most this amount due to Eqn. (C9) and the monotonicity of ∆ β RSL ( · )and ∆ β RSR ( · ) in Eqn. (C2). In other words, the new impressions make approximate dual feasibilityharder to satisfy, and prove the lemma by reducing it to the case when the union of Y N , Y R , and Y D covers [0 , B a ). Na¨ıve Measure Preserving Map.
Consider a trivial measure preserving map h ( y ) = y + B a .By the property of k a ( · ) in Lemma 4, there are two possible combinations of ˜ k a ( y ) and ˜ k a ( h ( y )): • ˜ k a ( y ) = k min + 1 and ˜ k a ( h ( y )) = k min ; or • ˜ k a ( y ) = k min and ˜ k a ( h ( y )) = k min .Moreover, we can rule out an impossible case by the policy of interval-level assignment. Lemma 39 (Impossible Case) . For any point y on the left half, and any point h ( y ) on the righthalf, it is impossible that y ∈ N and ˜ k a ( y ) = k min + 1 , while h ( y ) ∈ R and ˜ k a ( h ( y )) = k min .Proof. Suppose for contrary that there are such points y and h ( y ). By y ∈ N , we have [ y ∗ , y ) ⊆ Y N .Further by ˜ k a ( y ) = k min a + 1, it follows from Lemma 4 that y < y ∗ . Hence, by the panoramictreatment of the notation for intervals:[ y ∗ , y ) = [0 , y ) ∪ [ y ∗ , B a ) ⊆ Y N . Similarly, by h ( y ) ∈ Y R , we have [ h ( y ) , y ∗ ) ⊆ Y R . Further by ˜ k a ( h ( y )) = k min a , it follows fromLemma 4 that h ( y ) ≥ y ∗ . Hence, by the panoramic treatment of notation:[ h ( y ) , y ∗ ) = [0 , y ∗ ) ∪ [ h ( y ) , B a ) ⊆ Y R . Together we conclude that Y N ∩ Y R is not empty, contradicting their construction.Table 1 summarizes the worst-case bound for the LHS of Eqn. (59). Here, whenever y , or h ( y ),or both belong to Y N , we use the worst-case bound when they are in the small-bid subcase, andwhen y, h ( y ) ∈ Y N . This is indeed the main drawback of the trivial mapping since we cannotexploit the extra gain from the Y N case of small bids. As a result, these constraints are still toorestricted to get any competitive better than 0 . Our Measure Preserving Map.
Next, we design a better mapping to improve the cases when y, h ( y ) ∈ Y N . In particular, we shall rule out the case when both of them are in Y N , i.e., thetop-left cells in Table 2. Intuitively, this is possible because the total measure of Y N is at most B a . Concretely, we construct the measure-preserving map h in four steps as follows:1. For any y ∈ [0 , B a ) such that y ∈ Y N (which means y (cid:9) Y D B a ∈ Y N ), and y (cid:9) Y D B a ∈ [ B a , B a ),map y to h ( y ) def = y (cid:9) Y D B a .2. For any y ∈ [0 , B a ) such that y ⊕ Y D B a ∈ Y N (which means y ∈ Y N ), and y ⊕ Y D B a ∈ [ B a , B a ),map y to h ( y ) def = y ⊕ Y D B a / y ∈ Y N and y = y ⊕ Y D B a ∈ Y N such that there isone of the left half and one on the right half. The measure-preserving map h then maps the formerto the latter. We show an example in Figure 4. 50 N , k Y R , k Y D , kY N , k ψ NS L ( k ) + ψ NS R ( k ) ψ NS L ( k ) + ψ RR ( k ) ψ NS L ( k ) + ψ DR ( k ) Y N , k + 1 ψ NS L ( k + 1) + ψ NS R ( k ) (impossible) ψ NS L ( k + 1) + ψ DR ( k ) Y R , k ψ RL ( k ) + ψ NS R ( k ) ψ RL ( k ) + ψ RR ( k ) ψ RL ( k ) + ψ DR ( k ) Y R , k + 1 ψ RL ( k + 1) + ψ NS R ( k ) ψ RL ( k + 1) + ψ RR ( k ) ψ RL ( k + 1) + ψ DR ( k ) Y D , k ψ DL ( k ) + ψ NS R ( k ) ψ DL ( k ) + ψ RR ( k ) ψ DL ( k ) + ψ DR ( k ) Y D , k + 1 ψ DL ( k + 1) + ψ NS R ( k ) ψ DL ( k + 1) + ψ RR ( k ) ψ DL ( k + 1) + ψ DR ( k )Table 1: Approximate dual feasibility constraints with the na¨ıve map h ( y ) = y + B a . Rows arecombinations of y ’s type and ˜ k a ( y ). Columns are combinations of h ( y )’s type and ˜ k a ( h ( y )). Write˜ k a ( h ( y )) as k for brevity. The range is k ≥
1; the first, second, fourth, and sixth cells in the firstcolumn further include k = 0. This is because y (cid:54)∈ Y R ∪ Y D when ˜ k a ( y ) = 0 and h ( y ) (cid:54)∈ Y R ∪ Y D when ˜ k a ( h ( y )) = 0. Each formula in the table shall be at least 2Γ.3. For the points y ∈ [0 , B a ) and y ∈ Y N whose map h ( y ) remains undefined after the first twosteps, map to the unmapped points in [ B a , B a ) \ Y N in an arbitrary measure-preserving way.The third step ensures that the points in Y N are not mapped with each other. Why is thispossible? Suppose a measure of µ has been mapped from each half of the interval [0 , B a ) in the firsttwo steps. Recall that the first two steps only define a mapping between Y N - Y N pairs. Furtherby the observation that Y N has measure at most B a , the total measure of the unmapped points in Y N is at most B a − µ . Suppose µ L of these measure are on the left half. Then, Y N has a totalmeasure of at most B a − µ − µ L on the right. Finally, since the unmapped measure on the righthalf is precisely B a − µ , we conclude that the unmapped measure on the right half excluding Y N is at least ( B a − µ ) − ( B a − µ − µ L ) = µ L .4. For the points y ∈ [0 , B a ) whose map h ( y ) is still undefined, map to the unmapped points in[ B a , B a ) in an arbitrary measure-preserving way.Next, we present the main property of the above measure-preserving map h . Lemma 40 (Refined Y N - Y N Small-bid Cases) . Suppose y and h ( y ) are both in Y N in the small-bidsubcase. Then, it must be one of the following cases: • y ∈ Y N with ˜ k a ( y ) = k min a and h ( y ) ∈ Y N with ˜ k a ( h ( y )) = k min a . • y ∈ Y N with ˜ k a ( y ) = k min a + 1 and h ( y ) ∈ Y N with ˜ k a ( h ( y )) = k min a + 1 . • y ∈ Y N with ˜ k a ( y ) = k min a + 1 and h ( y ) ∈ Y N with ˜ k a ( h ( y )) = k min a .Proof. By definition, two points in Y N can not be matched. Further, if Y N is nonempty then Y N must have measure B a . Hence, any measure-preserving map that does not map points in Y N to each other must not map points outside Y N to each other. In other words, two points in Y N cannot be matched. Hence, we have one of y and h ( y ) in Y N and the other in Y N . It holds with equality in the case of two small bids, and with strict inequality in the case of only one small bidstrictly smaller than B a . y to h ( y ). ( µ ( Y N ) = B a / y ∈ Y N and h ( y ) ∈ Y N . By definition, [ y, h ( y )) \ Y D ⊆ Y N , which does not contain y ∗ in it. In other words, y < B a ≤ h ( y ) must be on the same side of y ∗ . Then, by Lemma 4 either˜ k a ( y ) = ˜ k a ( h ( y )) = k min a , which happens if they are both on the right of y ∗ , or ˜ k a ( y ) = ˜ k a ( h ( y )) = k min a + 1, which happens if they are both on the left of y ∗ .Next suppose y ∈ Y N and h ( y ) ∈ Y N . Similarly, by definition [ h ( y ) , y ) ⊆ Y N , which does notcontain y ∗ in it. Here recall the panoramic treatment of notation. By y < B a ≤ h ( y ) we furtherconclude [ h ( y ) , y ) = [0 , y ) ∪ [ h ( y ) , B a ) ⊆ Y N . In other words, y is on the left of y ∗ and h ( y ) is onthe right of y ∗ . Then, by Lemma 4, ˜ k a ( y ) = k min a + 1 and ˜ k a ( h ( y )) = k min a .Hence, for the top-left cells in the table which concern the cases when both y and h ( y ) arein Y N , it suffices to consider the large-bid subcase and the small-bid subcase stated in the abovelemma. We formulate the refined constraints in Table 2. Finally, to avoid having an infinite number of parameters and constraints, we set all parameters,including ∆ α RL ( k ), ∆ α RR ( k ), ∆ α DL ( k ), ∆ α DL ( k ), ∆ β RLL ( k ), ∆ β RLR ( k ), ∆ β RSL ( k ), ∆ β RSR ( k ), ∆ β DL ( k ),and ∆ β DR ( k ), to be 0 when k > k max for some sufficiently large integer k max . Doing so doesnot violate the regularity constraints, including the monotonicity constraints in Equations (C1),(C2), and (C3), the superiority of randomized rounds in Eqn. (C4), and the nonnegativity inEquations (C5) and (C6), so long as nonzero parameters for 1 ≤ k ≤ k max satisfy them. Further,the same applies to the constraints for simplifying the lower bounds of α variables in Equations (C7)and (C8). Further, the simplifying constraint in Eqn. (C9), which allows us to prove approximatedual feasibility by pairing points between left and right, does not consider large k and thus, isunaffected. Finally, the approximate feasibility constraints in Table 2 for k > k max simplifies to asingle boundary constraint as follows: k max (cid:88) k =1 (cid:0) ∆ α RL ( k ) + ∆ α RR ( k ) (cid:1) ≥ . (C10)52 N , k Y R , k Y D , kY N , k ψ NS L ( k ) + ψ NS R ( k ) ,ψ NLL ( k ) + ψ NLR ( k ) ψ NS L ( k ) + ψ RR ( k ) ψ NS L ( k ) + ψ DR ( k ) Y N , k + 1 ψ NS L ( k + 1) + ψ NS R ( k ) ,ψ NLL ( k + 1) + ψ NLR ( k ) (impossible) ψ NS L ( k + 1) + ψ DR ( k ) Y R , k ψ RL ( k ) + ψ NS R ( k ) ψ RL ( k ) + ψ RR ( k ) ψ RL ( k ) + ψ DR ( k ) Y R , k + 1 ψ RL ( k + 1) + ψ NS R ( k ) ψ RL ( k + 1) + ψ RR ( k ) ψ RL ( k + 1) + ψ DR ( k ) Y D , k ψ DL ( k ) + ψ NS R ( k ) ψ DL ( k ) + ψ RR ( k ) ψ DL ( k ) + ψ DR ( k ) Y D , k + 1 ψ DL ( k + 1) + ψ NS R ( k ) ψ DL ( k + 1) + ψ RR ( k ) ψ DL ( k + 1) + ψ DR ( k )Table 2: Approximate dual feasibility constraints with our measure preserving map. Rows arecombinations of y ’s type and ˜ k a ( y ). Columns are combinations of h ( y )’s type and ˜ k a ( h ( y )). Write˜ k a ( h ( y )) as k for brevity. The range is k ≥
1; the first, second, fourth, and sixth cells in the firstcolumn further include k = 0. This is because y (cid:54)∈ Y R ∪ Y D when ˜ k a ( y ) = 0 and h ( y ) (cid:54)∈ Y R ∪ Y D when ˜ k a ( h ( y )) = 0. Each formula in the table shall be at least 2Γ.The finite LP has O ( k max ) constraints and is formulated below:maximize Γsubject to Regularity Constraints (C1) (C2) (C3) (C4)(C5) (C6) 1 ≤ k ≤ k max Simplifying Constraints (C7) (C8) 1 ≤ k ≤ k max Simplifying Constraint (C9)Approximate Dual Feasibility Constraints in Table 2 0 or 1 ≤ k ≤ k max Boundary Constraint (C10)Finally, we set k max = 20 and solve the LP using the PuLP package in Python. This gives aset of parameters with Γ > . α a ≥ β i ≥ A Small Bids
This section presents the analysis of the algorithm by Mehta et al. [36] for small bids, i.e., b ai ≤ B a for any advertiser a ∈ A and any impression i ∈ I . In this case, we consider a deterministicalgorithm that can achieve a competitive ratio, which is part of results by Mehta et al. [36]. Werestate and analyze it using online primal dual framework and the configuration LP as demonstratedin Section 2. This may serve as a warmup for readers who are not familiar with the framework.Further, this is combined with the approach dealing with the large bids case in Section 4.2 to obtaina hybrid algorithm in Section 6, and to get the competitive ratio as stated in Theorem 1. Our code is at available at lgorithm 6 A Deterministic Online Primal Dual Algorithm by Mehta et al. [36] (Parameterizedby a function α : [0 , (cid:55)→ [0 , state variables: S a , the subset of impressions that are assigned to a already for all impression i dofor all advertiser a ∈ A do compute β i according to Eqn. (62) end for find a ∗ that maximize β i , and assign i to a ∗ end for A.1 Online Primal Dual Algorithm
Algorithm 6 is driven by maximizing the dual variable β i for each impression i . For each advertiser a , maintain the following invariant based on the subset of impressions that are already assigned to a , denoted as S a : α a def = B a · α (cid:18) b a ( S a ) B a (cid:19) , (60)where α : [0 , (cid:55)→ [0 ,
1] is a function to be optimized in the analysis. The online primal dualalgorithm and analysis shall impose several conditions on α , which we shall explain shortly in thenext subsection.The computation of β i is based on the online primal dual framework in Lemma 2. First, let theprimal objective equal the dual objective, i.e., P = D . In fact, we make the increments of primaland dual objectives equal in each round of assignments. That is, if an impression i is assigned toan advertiser a , the assigned subset of impressions to advertiser a changes from S a to S a ∪ { i } :∆ D = ∆ P = b a ( S a ∪ { i } ) − b a ( S a ) . Second, we divide the increment of the dual objective into two parts, the increment of α a and thevalue of β i . By Eqn. (60), the former equals:∆ α a = B a (cid:18) α (cid:18) b a ( S a ∪ { i } ) B a (cid:19) − α (cid:18) b a ( S a ) B a (cid:19)(cid:19) . For convenience of notations, for any y ∈ [0 , β ( y ) def = y − α ( y ) . (61)Thus, define: β i def = ∆ D − ∆ α a = B a (cid:18) β (cid:18) b a ( S a ∪ { i } ) B a (cid:19) − β (cid:18) b a ( S a ) B a (cid:19)(cid:19) . (62)Algorithm 6 assigns each impression i a neighboring advertiser to maximize the value of β i . A.2 Online Primal Dual Analysis
This subsection presents the online primal dual analysis using the framework in Lemma 2, andproves the following theorem.
Theorem 41.
Algorithm 6 is -competitive for AdWords under small bids assumption. α which imply the approximate dual feasibility. Finally, we optimize α by solving a set of inequalitiesderived by the feasibility analysis. Recall the approximate dual feasibility in Lemma 2. For anyadvertiser a and any subset of impressions S ⊆ I , we need: α a + (cid:88) i ∈ S β i ≥ Γ · b a ( S ) . (63) Conditions on α . We first give some conditions for α , which simplify the subsequent analysis.1. Initial values.
The above primal and dual assignments ensure equal increments in the primaland dual objectives in every step. In order to get equal primal and dual objectives, we furtherneed them to have value 0 initially. It follows from its definition that the primal objectiveequals 0 at the beginning. Thus we need: α (0) = 0 , β (0) = 0 . (64)2. Convexity of α and Concavity of β . We shall choose α such that the portion of theprimal increment that is assigned to α a , i.e., the ratio of the increment of α a to the primalincrement, is nondecreasing in the value of α a . This further implies the concavity of β bydefinition. We need: ∀ y ∈ [0 , α (cid:48)(cid:48) ( y ) ≥ , β (cid:48)(cid:48) ( y ) ≤ . (65)This condition is driven by the online primal dual analysis of approximate dual feasibility,i.e., Eqn. (63). In particular, the crux case is when none of i ∈ S is assigned to a and thus,the value of β i is lower bounded by what advertiser a offers in Eqn. (62). When α is smaller,we need to offer a larger portion of the gain to β in order to guarantee approximate dualfeasibility; and vice versa.3. Curvature of α . We restrict the curvature of α with upper and lower bounds on its deriva-tive, which implies bounds on the derivative of β by definition: ∀ y ∈ [0 ,
1] 1 − Γ ≤ α (cid:48) ( y ) ≤ , ≤ β (cid:48) ( y ) ≤ Γ . (66)The upper bound on α (cid:48) and the lower bound on β (cid:48) ensures that the assignment of β i ’s inEqn. (62) satisfies nonnegativity. The lower bound on α (cid:48) and the upper bound on β (cid:48) is drivenby the observation that offering a Γ portion of the gain of an edge ( a, i ) to β i is sufficient forcovering the contribution of the edge to the RHS of Eqn. (63). Contribution from β i . We next show the approximate dual feasibility, i.e., Eqn. (63), by char-acterizing the contribution from i ∈ S for any S ⊆ I . Let S a be the set of impressions assigned toadvertiser a . Lemma 42.
For any impression i ∈ S , β i ≥ B a (cid:18) β (cid:18) b a ( S a ∪ { i } ) B a (cid:19) − β (cid:18) b a ( S a ) B a (cid:19)(cid:19) . Proof.
Any impression i ∈ S could have gotten a share equal to the above, except that S a mightbe a smaller subset at the time when i arrives; the RHS above is therefore a valid lower bound bythe concavity of β and the definition of the algorithm.55 roof of Theorem 41. Combining Lemma 42 with the definition of α a , it remains to prove that: B a α (cid:18) b a ( S a ) B a (cid:19) + (cid:88) i ∈ S B a (cid:18) β (cid:18) b a ( S a ∪ { i } ) B a (cid:19) − β (cid:18) b a ( S a ) B a (cid:19)(cid:19) ≥ Γ · b a ( S ) . Next, we simplify the above inequality using the sufficient conditions in Equations (65) and(66). First, dividing both sides by B a , it becomes clear that only the ratios of the bids b ai ’s tothe budget B a matter. Hence, we may wlog normalize B a = 1 to simplify notations. The aboveinequality turns to: α (cid:0) b a ( S a ) (cid:1) + (cid:88) i ∈ S (cid:0) β (cid:0) b a ( S a ∪ { i } ) (cid:1) − β (cid:0) b a ( S a ) (cid:1)(cid:1) ≥ Γ · b a ( S ) , where the definition of b a ( · ) becomes: ∀ S ⊆ I : b a ( S ) def = min (cid:26) , (cid:88) i ∈ S b ai (cid:27) . Second, we claim that it suffices to consider the case when (cid:80) i ∈ S b ai = B a = 1. If (cid:80) i ∈ S b ai is strictly larger than B a , we may decrease some b ai : the LHS weakly decreases while the RHSremains the same. If (cid:80) i ∈ S b ai is strictly smaller than B a , on the other hand, we may increase some b ai so that the LHS increases at rate at most Γ (by the definition of b a and Eqn. (65)), the the RHSincreases at rate exactly Γ.Finally, recall that the small bids assumption ensures any b ai ≤ B a . Thus by the concavity of β (Eqn. (66)) and b a ( · ), it suffices to consider | S | = 2 with b ai = for both i ∈ S .Therefore, for any b = b a ( S a ), we need: α ( b ) + 2 · (cid:0) β (min { b + , } ) − β ( b ) (cid:1) ≥ Γ . Combining with the definition of β in Eqn. (61), and writing the cases of 0 ≤ β ≤ and < β ≤ ≤ b ≤ : 3 α ( b ) − α ( b + ) + 1 ≥ Γ ; < b ≤ α ( b ) − α (1) + 2(1 − b ) ≥ Γ . Solving it with the boundary condition α (0) = 0 (Eqn. (64)) gives Γ = with: α ( y ) = (cid:40) y ≤ y ≤ ; y −
19 12 < y ≤ . References [1] Gagan Aggarwal, Gagan Goel, Chinmay Karande, and Aranyak Mehta. Online vertex-weightedbipartite matching and single-bid budgeted allocations. In
Proceedings of the 22nd AnnualACM-SIAM Symposium on Discrete Algorithms , pages 1253–1264. SIAM, 2011.[2] Kenneth S Alexander. A counterexample to a correlation inequality in finite sampling.
TheAnnals of Statistics , pages 436–439, 1989. 563] Itai Ashlagi, Maximilien Burq, Chinmoy Dutta, Patrick Jaillet, Amin Saberi, and Chris Shol-ley. Edge weighted online windowed matching. In
Proceedings of the 2019 ACM Conferenceon Economics and Computation , pages 729–742, 2019.[4] Niv Buchbinder, Kamal Jain, and Joseph Seffi Naor. Online primal-dual algorithms for max-imizing ad-auctions revenue. In
In Proceedings of the 15th Annual European Symposium onAlgorithms , pages 253–264. Springer, 2007.[5] Internet Advertising Bureau. Iab internet advertising revenue report: 2016 full year results,2017.[6] Internet Advertising Bureau. Iab internet advertising revenue report: 2019 first six monthsresults, 2019.[7] Nikhil R Devanur and Thomas P Hayes. The adwords problem: online keyword matching withbudgeted bidders under random permutations. In
Proceedings of the 10th ACM conference onElectronic commerce , pages 71–78, 2009.[8] Nikhil R Devanur and Kamal Jain. Online matching with concave returns. In
Proceedings ofthe 44th Annual ACM Symposium on Theory of Computing , pages 137–144, 2012.[9] Nikhil R Devanur, Kamal Jain, Balasubramanian Sivan, and Christopher A Wilkens. Nearoptimal online algorithms and fast approximation algorithms for resource allocation problems.In
Proceedings of the 12th ACM conference on Electronic commerce , pages 29–38, 2011.[10] Nikhil R Devanur, Kamal Jain, and Robert D Kleinberg. Randomized primal-dual analy-sis of ranking for online bipartite matching. In
Proceedings of the 24th Annual ACM-SIAMSymposium on Discrete algorithms , pages 101–107. SIAM, 2013.[11] Nikhil R Devanur, Zhiyi Huang, Nitish Korula, Vahab S Mirrokni, and Qiqi Yan. Whole-pageoptimization and submodular welfare maximization with online bidders.
ACM Transactionson Economics and Computation (TEAC) , 4(3):1–20, 2016.[12] Matthew Fahrbach and Morteza Zadimoghaddam. Online weighted matching: breaking the barrier. arXiv , pages arXiv–1704, 2017.[13] Matthew Fahrbach, Zhiyi Huang, Runzhou Tao, and Morteza Zadimoghaddam. Edge-weightedonline bipartite matching. In Proceedings of the 61st Annual IEEE Symposium on Foundationsof Computer Science . IEEE, 2020.[14] Jon Feldman, Nitish Korula, Vahab Mirrokni, Shanmugavelayutham Muthukrishnan, and Mar-tin P´al. Online ad assignment with free disposal. In
Proceedings of the 5th InternationalWorkshop on Internet and Network Economics , pages 374–385. Springer, 2009.[15] Jon Feldman, Aranyak Mehta, Vahab Mirrokni, and Shan Muthukrishnan. Online stochasticmatching: beating 1 − e . In Proceedings of the 50th Annual IEEE Symposium on Foundationsof Computer Science , pages 117–126. IEEE, 2009.[16] Buddhima Gamlath, Michael Kapralov, Andreas Maggiori, Ola Svensson, and David Wajc.Online matching with general arrivals. In
Proceedings of the 60th Annual IEEE Symposiumon Foundations of Computer Science , pages 26–37. IEEE, 2019.5717] Gagan Goel and Aranyak Mehta. Online budgeted matching in random input models with ap-plications to AdWords. In
Proceedings of the 19th Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 982–991, 2008.[18] Bernhard Haeupler, Vahab S Mirrokni, and Morteza Zadimoghaddam. Online stochasticweighted matching: Improved approximation algorithms. In
International Workshop on In-ternet and Network Economics , pages 170–181. Springer, 2011.[19] Zhiyi Huang. Understanding Zadimoghaddam’s edge-weighted online matching algorithm:weighted case. arXiv preprint arXiv:1910.03287 , 2019.[20] Zhiyi Huang and Runzhou Tao. Understanding Zadimoghaddam’s edge-weighted online match-ing algorithm: unweighted case. arXiv preprint arXiv:1910.02569 , 2019.[21] Zhiyi Huang and Qiankun Zhang. Online primal dual meets online matching with stochasticrewards: configuration LP to the rescue. In
Proceedings of the 52nd ACM Symposium onTheory of Computing , 2020.[22] Zhiyi Huang, Ning Kang, Zhihao Gavin Tang, Xiaowei Wu, Yuhao Zhang, and Xue Zhu. Howto match when all vertices arrive online. In
Proceedings of the 50th Annual ACM SIGACTSymposium on Theory of Computing , pages 17–29, 2018.[23] Zhiyi Huang, Zhihao Gavin Tang, Xiaowei Wu, and Yuhao Zhang. Online vertex-weightedbipartite matching: beating 1 − e with random arrivals. In Proceedings of the 45th InternationalColloquium on Automata, Languages, and Programming . Schloss Dagstuhl-Leibniz-Zentrumfuer Informatik, 2018.[24] Zhiyi Huang, Binghui Peng, Zhihao Gavin Tang, Runzhou Tao, Xiaowei Wu, and Yuhao Zhang.Tight competitive ratios of classic matching algorithms in the fully online model. In
Proceedingsof the 30th Annual ACM-SIAM Symposium on Discrete Algorithms , pages 2875–2886. SIAM,2019.[25] Zhiyi Huang, Ning Kang, Zhihao Gavin Tang, Xiaowei Wu, Yuhao Zhang, and Xue Zhu. Fullyonline matching.
Journal of the ACM , 67(3):1–25, 2020.[26] Zhiyi Huang, Zhihao Gavin Tang, Xiaowei Wu, and Yuhao Zhang. Fully online matching ii:beating ranking and water-filling. In
Proceedings of the 61st Annual IEEE Symposium onFoundations of Computer Science . IEEE, 2020.[27] Patrick Jaillet and Xin Lu. Online stochastic matching: New algorithms with better bounds.
Mathematics of Operations Research , 39(3):624–646, 2014.[28] Michael Kapralov, Ian Post, and Jan Vondr´ak. Online submodular welfare maximization:Greedy is optimal. In
Proceedings of the 24th Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1216–1225. SIAM, 2013.[29] Chinmay Karande, Aranyak Mehta, and Pushkar Tripathi. Online bipartite matching withunknown distributions. In
Proceedings of the 43rd Annual ACM Symposium on Theory ofComputing , pages 587–596, 2011.[30] Richard M Karp, Umesh V Vazirani, and Vijay V Vazirani. An optimal algorithm for on-line bipartite matching. In
Proceedings of the 22nd Annual ACM Symposium on Theory ofComputing , pages 352–358, 1990. 5831] Thomas Kesselheim, Klaus Radke, Andreas T¨onnis, and Berthold V¨ocking. An optimal onlinealgorithm for weighted bipartite matching and extensions to combinatorial auctions. In
InProceedings of the 21th European Symposium on Algorithms , pages 589–600. Springer, 2013.[32] Mohammad Mahdian and Qiqi Yan. Online bipartite matching with random arrivals: anapproach based on strongly factor-revealing lps. In
Proceedings of the 43rd Annual ACMSymposium on Theory of Computing , pages 597–606, 2011.[33] Vahideh H Manshadi, Shayan Oveis Gharan, and Amin Saberi. Online stochastic matching:Online actions based on offline statistics.
Mathematics of Operations Research , 37(4):559–573,2012.[34] Aranyak Mehta. Online matching and ad allocation.
Foundations and Trends in TheoreticalComputer Science , 8(4):265–368, 2013.[35] Aranyak Mehta and Debmalya Panigrahi. Online matching with stochastic rewards. In
Pro-ceedings of the 53rd IEEE Annual Symposium on Foundations of Computer Science , pages728–737. IEEE, 2012.[36] Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani. Adwords and generalizedonline matching.
Journal of the ACM , 54(5):22:1–22:19, 2007.[37] Aranyak Mehta, Bo Waggoner, and Morteza Zadimoghaddam. Online stochastic matchingwith unequal probabilities. In
Proceedings of the 26th Annual ACM-SIAM Symposium onDiscrete Algorithms , pages 1388–1404. SIAM, 2014.[38] Vahab S Mirrokni, Shayan Oveis Gharan, and Morteza Zadimoghaddam. Simultaneous ap-proximations for adversarial and stochastic online budgeted allocation. In
Proceedings of the23rd Annual ACM-SIAM Symposium on Discrete Algorithms , pages 1690–1701. SIAM, 2012.[39] Yajun Wang and Sam Chiu-wai Wong. Two-sided online bipartite matching and vertex cover:Beating the greedy algorithm. In