Algorithms for Manipulating Sequential Allocation
AAlgorithms for Manipulating Sequential Allocation
Mingyu Xiao
School of Computer Science andEngineering, University of ElectronicScience and Technology of China, China, [email protected]
Jiaxing Ling
School of Computer Science andEngineering, University of ElectronicScience and Technology of China, China,
Abstract
Sequential allocation is a simple and widely studied mechanism to allocate indivisibleitems in turns to agents according to a pre-specified picking sequence of agents. At eachturn, the current agent in the picking sequence picks its most preferred item among all itemshaving not been allocated yet. This problem is well-known to be not strategyproof, i.e., anagent may get more utility by reporting an untruthful preference ranking of items. It arisesthe problem: how to find the best response of an agent? It is known that this problem ispolynomially solvable for only two agents and NP-complete for arbitrary number of agents.The computational complexity of this problem with three agents was left as an open problem.In this paper, we give a novel algorithm that solves the problem in polynomial time for eachfixed number of agents. We also show that an agent can always get at least half of its optimalutility by simply using its truthful preference as the response.
Sequential allocation is a simple and widely studied mechanism to allocate indivisible items toagents [7, 8, 3]. In a sequential allocation mechanism, there are several indivisible items tobe allocated to some agents, each agent has a strict preference ranking over all the items, andthere is a sequence of the agents, called the policy , to specify the turns of the agents to get theitems. The items are allocated to the agents according to the policy: at each turn, the currentagent on the policy picks the most preferred item in its preference ranking that has not yet beenallocated. We give an example.
Example 1
There are five items { a, b, c, d, e } , three agents { , , } with preference rankingsAgent 1 : a (cid:31) b (cid:31) c (cid:31) d (cid:31) e Agent 2 : c (cid:31) b (cid:31) e (cid:31) d (cid:31) a Agent 3 : e (cid:31) b (cid:31) d (cid:31) c (cid:31) a and a policy π : 13221 . a r X i v : . [ c s . D S ] S e p n this example, Agent 1 will take a at the first turn, Agent 3 will take e at the second turn,Agent 2 will take two items c and b at the third and fourth turns, and Agent 1 will take d atthe last turn.In sequential allocation, given a fixed policy, the outcome will only depend on the ordinalpreference rankings of agents over items. It is folklore that sequential allocation is not strate-gyproof, which means that an agent may get more utility by reporting an untruthful preferenceranking. For example, in the above instance, if Agent 1 misreports its preference ranking as b (cid:31) a (cid:31) c (cid:31) d (cid:31) e , then it will get items { b, a } , while originally it will get items { a, d } . Agent 1may get more utility by taking { b, a } since b (cid:31) d . This motivates many aspects of study on thismechanism.There are several models based on the sequential allocation mechanism. We have differ-ent objectives to maximize the overall social welfare [7] or the utility of a certain agent [8],and different requirements on the pattern of the picking sequences and the number of agents.One of the earliest models studied in [13] has two agents and the policy is strictly alternating( e.g., . . . ). A balanced alternation pattern of the policy ( e.g., . . . ) was studiedin [9]. One interesting application of sequential allocation was found in course allocation tostudents and several axiomatic properties and manipulability on this application have been re-vealed. Budish and Cantillion [10] investigated a randomized version of the sequential allocationmechanism to allocate courses to students, and Pareto optimal solutions for a model of courseallocation were studied in [11]. The Boston mechanism is another sequential allocation mecha-nism with applications in school choice for students [1, 14]. A general and systematic study ofthe sequential allocation was done by Bouveret and Lang [7] and by Kalinowski et al. [12] froma game-theoretic view. Since the work of Kohler and Chandrasekaran [13], a series of followupworks on strategic aspects of sequential allocation have been made [5, 6, 15, 16, 4].In this paper, we consider manipulations in sequential allocation. In this model, the policyis given, and among all agents, one is the manipulator and all others are non-manipulators . Themanipulator needs to report a list of items as its preference ranking to achieve a certain objective.There are two commonly used assumptions. Firstly, the manipulator has complete informationabout the reported preferences of non-manipulators. This is a worst case assumption often madein computer science and economics. Secondly, the manipulator has additive cardinal utilities forthe items, although agents report strict and ordinal preferences. This assumption is standardin this research area.We can define several problems with different objectives of the problem. The Best Re-sponse problem is to find a best response of the manipulator (i.e., a preference ranking whichallows it to obtain the maximum utility).
Better Than Truth Response is to ask whetherthe manipulator can get more utility than the allocation under its truthful report.
AllocationResponse is to ask whether the manipulator can get a specified bundle of items. Among allthese problems,
Best Response seems to be the hardest one and a solution to it can implysolutions to other problems, since other problems can be easily reduced to
Best Response .See [2] for a recent survey on the results for these problems.2or
Best Response , Bouveret and Lang [7] first showed that the problem with only twoagents (one manipulator and one non-manipulator) can be solved in polynomial time. ThenAziz et al. [3] proved that it is NP-hard to compute the best response of the manipulator ifthe number of agents is part of the input by correcting a wrong claim in a previous paper. Itbecomes an open problem whether
Best Response is polynomially solvable for three or anotherconstant number of agents [3]. This open problem is interesting because it is already knownthat the problem is polynomially solvable with the utility functions of the manipulator beingsome specified functions, such as lexi-cographic utilities and binary utilities [7, 3]. In this paper,we fully answer this question by giving a dynamic programming algorithm for
Best Response that runs in polynomial time for any fixed number of agents and any additive utility functions.In addition, we show that the manipulator can always get at least half of the optimal utility ifit simply uses the truthful preference ranking, where the approximation ratio is tight as far asusing the truthful preference ranking.
In the sequential allocation problem, m items are going to be allocated to n agents accordingto a policy π , which is a sequence of agents specifying the turns of the agents to get items. Thelength | π | of the policy is m since there are m items to be allocated. The set of items is denotedby O = { g , g , . . . , g m } and the set of agents is denoted by N = { , , . . . , n } , where Agent 1 isthe manipulator and all other agents are non-manipulators. Each agent i ∈ N has a completepreference ranking (cid:31) i : g i , g i , . . . , g i m over all items in O . We will write g p (cid:31) i g q to denotethat item g p is ranked ahead of g q in Agent i ’s preference ranking. The manipulator (Agent 1)has an additive utility function on the items u : O → (cid:60) +. For two items g x , g y ∈ O , it holds u ( g x ) > u ( g y ) if and only if g x (cid:31) g y . We use k i ( i ∈ N ) to denote the frequency of Agent i appearing in the policy π , and use m (cid:48) to denote the frequency of non-manipulators appearingin π . Then it holds that m = n (cid:88) i =1 k i , and m (cid:48) = m − k . For
Best Response , the manipulator wants to find a picking strategy to achieve its max-imum utility, i.e., a permutation of all the items, according to which to pick up items themanipulator can get the maximum utility. When we say a solution to Best Response , it isregarded as the optimal picking strategy or the bundle of items for the manipulator determinedby the optimal picking strategy. We use I = ( O, N, π, {(cid:31) i } ni =1 ) to denote our input instance,where we omit the utility function of the manipulator to simplify the description since for mostcases we only use the preference ranking (cid:31) .Once a picking strategy is given, we will get a fixed sequence of allocations of all items toagents, called allocation sequence . We will say the above picking strategy and allocation sequenceare associated with each other. If there is no picking strategy associated with an allocationsequence, then the allocation sequence is called infeasible ; otherwise, it is called feasible . For afeasible allocation sequence, it is easy to construct one picking strategy associated with it.3 partial allocation sequence is a sub sequence of an allocation sequence beginning fromthe first allocation. We will use ξ to denote an allocation sequence and use ξ ( i ) to denote thepartial allocation sequence of the first i allocations of ξ . For each feasible partial allocationsequence of length l , there is a partial policy of length l and a partial picking strategy associatedwith it. After executing a partial allocation sequence according to a partial policy, we will geta remaining problem which is to allocate the remaining items to the agents according to theremaining policy.Given a (partial) allocation sequence, we say an item g has been considered by Agent i beforethe x th position of the (partial) policy if during the first x allocations in the sequence the lastitem allocated to Agent i is ranked lower than item g in Agent i ’s preference ranking. Note thatan item may not be allocated to an agent even if the item has been considered by the agent.A segment in a policy is a maximal continuous subsequence containing at most one positionof a non-manipulator and only the last position of the subsequence can be the non-manipulator.A policy having m (cid:48) positions of non-manipulators can be partitioned into m (cid:48) + 1 segments bycutting after each non-manipulator position, where the last segment is called a trivial segment .A trivial segment only contains copies of the manipulator and it may be empty (when the lastposition of the policy is a non-manipulator). A nontrivial segment may contain only one non-manipulator. We will use π s ( x ) to denote the partial policy of the first x segments of π . The core of a (partial) policy is the sequence of agents obtained by deleting all occurrences of themanipulator from the (partial) policy. See Figure 1 for an illustration of the segments and cores. Policy : The core:
13 112 3 111 (2) s Trivial segment
Figure 1: The segments and coreThe position vector of the manipulator in a (partial) policy π is a sequence of increasingpositive integers, ( z , z , . . . , z k ) to denote the positions of the manipulator in the policy π , i.e.,the manipulator appears on the z th, z th, . . . , and z k th positions in the (partial) policy π . Apolicy π dominates another policy π (cid:48) if they have the same length and the same core and it holdsthat z i ≤ z (cid:48) i , i ∈ { , , . . . , k } , for manipulator position vectors in π and π (cid:48) being ( z , z , . . . , z k )and ( z (cid:48) , z (cid:48) , . . . , z (cid:48) k ), respectively. This is to say that π (cid:48) can be obtained from π by iterativelymoving a manipulator in it to the next position. For two instances I = ( O, N, π, {(cid:31) i } ni =1 ) and I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) with only different policies, if π dominates π (cid:48) , then we say instance I dominates instance I (cid:48) .Our algorithm for Best Response uses two major ideas. The first idea is to reduce instancesto constrained instances, called “crucial instances”. Crucial instances can be solved quicklydirectly. However, it is not easy to find the corresponding crucial instances and we still need to4earch among a large number of candidates. So we also use the second idea, which is a divide-and-conquer technique, to reduce the number of candidates. The divide-and-conquer methodwill split the allocation problem into two subproblems: the first one is to allocate a fixed setof items and the second one is to allocate the remaining set of items. To guarantee that wecan combine optimal solutions to the two parts to construct an optimal solution for the wholeproblem, we need some “invariance properties”. Based on “invariance properties”, we are ableto design a dynamic programming algorithm to save running time. We first introduce the twoideas in the following two sections. In Best Response , we may have the same optimal picking strategy for two instances withonly different policies. These instances have some common properties. We will classify someinstances (and their policies) that have the same optimal picking strategy and solution into aclass. In each class, there is a special instance, called “crucial instance”, which can be solveddirectly. So we will try to solve an instance by solving the corresponding crucial instance in thesame class. This is the rough idea of our algorithm.We give an example to illustrate that two instances with only different policies have the sameoptimal solution. In Example 1, the manipulator gets the best bundle { a, b } by using pickingstrategy bacde . We use I (cid:48) to denote the instance after replacing the policy π : 13221 with policy π (cid:48) : 32121. In I (cid:48) , the manipulator can get the same best bundle { a, b } by using the same pickingstrategy. Compared with π , the manipulator has a lower priority to pick items in π (cid:48) . However,the manipulator still can get the best solution. Note that, at the first position of the policy π ,the manipulator picks an item that will not be considered by any non-manipulator before the3rd allocation. So we can delay the allocation of b to Agent 1 from position 1 to position 3without changing the optimality. Given an instance, we want to know how much we can delaythe positions of the manipulator without losing the optimality and the “worst” policy will be“crucial”. Definition 1 (Crucial Instance)
For an instance I = ( O, N, π, {(cid:31) i } ni =1 ) , if for any policy π (cid:48) (cid:54) = π dominated by π , the optimal solution to the dominated instance I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) isworse than that to I , then we say I is a crucial instance. Let I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) be a crucialinstance, for any instance I = ( O, N, π, {(cid:31) i } ni =1 ) dominating I (cid:48) , we say I (cid:48) is a corresponding crucial instance to I . A corresponding crucial instance of an instance may be itself when it is already a crucialinstance. To solve an instance, we can turn to solve a corresponding crucial instance by thefollowing lemmas.
Lemma 1
Given two instances I = ( O, N, π, {(cid:31) i } ni =1 ) and I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) , where I dominates I (cid:48) . By using the same picking strategy, the manipulator in instance I will get a undle with total utility not less than that in I (cid:48) . Furthermore, for any picking strategy S (cid:48) , thereis a picking strategy S such that by using S in I the manipulator can get the same bundle asthat by using S (cid:48) in I (cid:48) .Proof. The first claim is easy to observe. We focus on the second claim.We define the picking strategy S to I as follows: order the items according to the orderingof allocations to the manipulator in I (cid:48) by using S (cid:48) , i.e., an item is ranked on the i th position in S if it is the i th item allocated to the manipulator in I (cid:48) , and all other items not allocated to themanipulator in I (cid:48) are listed behind with any order. Let ( z , z , . . . , z k ) and ( z (cid:48) , z (cid:48) , . . . , z (cid:48) k ) bethe position vectors of the manipulator of π and π (cid:48) . Since π dominates π (cid:48) , we know that z i ≤ z (cid:48) i for any i ∈ { , , . . . , k } . If an item can be allocated to the manipulator at position z (cid:48) i in π (cid:48) ,then it can also be allocated to the manipulator at position z i ≤ z (cid:48) i in π , since only a subset ofitems have been allocated before position z i in π (compared to the situation at position z (cid:48) i in π (cid:48) ). So at each position, the manipulator can always get the current item on its picking strategy S . By using S , the manipulator in I gets the same bundle as that by using S (cid:48) in I (cid:48) . Lemma 2
Let I = ( O, N, π, {(cid:31) i } ni =1 ) be an instance and S be a picking strategy for the ma-nipulator. Assume that by taking S the manipulator picks an item at the i th position of π andthis item is not considered by any non-manipulator before the j th allocation, where j > i + 1 .Let π (cid:48) be the new policy obtained from π by moving the manipulator from the i th position to the ( j − th position. By using picking strategy S in I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) the manipulator canget the same bundle as that by using S in I .Proof. We consider the allocation sequence under the picking strategy S in I : g (cid:55)→ l , g (cid:55)→ l , . . . , g m (cid:55)→ m l m , where l , l , . . . , l m ∈ N and it is possible l j = l k for j (cid:54) = k . In the allocationsequence, item g i is allocated to the manipulator at position i , and before allocating the j th good g j , the item g i has never been considered by any non-manipulator. Therefore, moving the i thposition to the j th position in π will not affect the allocations on the ( i +1)th to ( j − g i to the manipulator onthe j th position since it is still available. All other allocations will keep unchanged. Thus,the following is still a feasible allocation sequence for I (cid:48) : g (cid:55)→ l , . . . , g i − (cid:55)→ i − l i − , g i +1 (cid:55)→ i l i +1 , . . . , g i (cid:55)→ j − l i , g j (cid:55)→ j l j , . . . , g m (cid:55)→ m i m , in which the manipulator will get the same bundleas that in I . Corollary 1
Let I = ( O, N, π, {(cid:31) i } ni =1 ) be an instance and I (cid:48) = ( O, N, π (cid:48) , {(cid:31) i } ni =1 ) be a corre-sponding crucial instance. An optimal picking strategy for I (cid:48) is also an optimal picking strategyfor I . Next, we show how to solve crucial instances.
Lemma 3
Crucial instances of
Best Response can be solved in linear time.
6o prove Lemma 3, we use an algorithm to solve crucial instances optimally. The algorithmis a greedy algorithm, called GreedyAlg. We introduce the algorithm separately below becauseit will also be used in several places later.
Algorithm GreedyAlg
The algorithm GreedyAlg takes a (sub) instance of
Best Re-sponse as the input, and outputs an allocation sequence with the corresponding picking strategyfor the manipulator. However, the output allocation sequence may not be optimal for non-crucialinstances.The main idea of the algorithm is as follows. We allocate items to agents according to thepolicy. Assume that we have allocated the first i − I = ( O, N, π, {(cid:31) i } ni =1 ). If the i th position in the policy π is a non-manipulator, we let thenon-manipulator pick its most preferred item that has not yet been allocated. Next, we considerthe situation that the i th position in the policy π is the manipulator. The algorithm decidesthe item that should be assigned to the manipulator at the i th turn by the following method.We let I (cid:48) = ( O (cid:48) , N, π (cid:48) , {(cid:31) (cid:48) i } ni =1 ) be the remaining instance after allocating the first i − I . Then the first position in π (cid:48) is the manipulator. Let π − be the core of π (cid:48) .If π − is empty,we assign the best remaining item o in the truthful preference ranking of the manipulator to themanipulator at the i th position of π and let o be the i th object in the picking strategy of themanipulator. If π − is not empty, we let o f be the favourite item in O (cid:48) of the first agent in π − .GreedyAlg will assign item o f to the manipulator at the i th position of π and let o f be the i thobject in the picking strategy of the manipulator.According to the above method, the algorithm decides the items to be assigned to themanipulator from the first occurrence of 1 to the last occurrence of 1 in π and then we canget a full allocation sequence and a picking strategy for the manipulator. This is the algorithmGreedyAlg.It is easy to see that GreedyAlg can be implemented in linear time and the picking strategyreturned by GreedyAlg for each instance is unique.The allocation sequence returned by GreedyAlg on a (sub) instance is called greedy . Givenan allocation sequence, we can easily check whether it is greedy or not. The concept of greedyallocation sequence is also important and will be used later.The correctness of Lemma 3 directly follows from the following lemma. Lemma 4
The greedy strategy for a crucial instance is the optimal solution to it.
In fact, a crucial instance has only one optimal allocation sequence, which is the greedy oneobtained by GreedyAlg. Note that if in a solution, the manipulator at the i th position of π does not pick item o f , then we could move the i th position of 1 to the behind of the first non-manipulator in π to get a dominated instance I (cid:48) , where I (cid:48) has the same optimal solution as I (cid:48) ,which contradicts the fact that I is a crucial instance.Although crucial instances can be solved quickly, it is still hard to find a correspondingcrucial instance for an arbitrary instance. We need to reveal more properties for dominatedinstances. 7emma 1 implies that the optimal solution to an instance is not worse than the optimalsolution to any dominated instance. Clearly, the opposite direction of Lemma 1 may not hold.We prove the following lemma. Lemma 5
Let I be an instance and P be the set of instances dominated by I . For each I (cid:48) ∈ P ,we use G ( I (cid:48) ) to denote the greedy allocation sequence of I (cid:48) . Assume that the greedy allocationsequence G ( I ) ( I ∈ P) gets the best solution among all G ( I (cid:48) ) with I (cid:48) ∈ P . Then I is a crucialinstance corresponding to I . The correctness of this lemma follows from Lemma 1, Corollary 1 and Lemma 4. Lemma 1says any dominated instance I (cid:48) will not have a better solution than I . Corollary 1 says thatthere is at least one dominated instance, the corresponding crucial instance will achieve the sameoptimal solution to I . The greedy allocation sequence may not be optimal for any instance. Butit is optimal for a crucial instance by Lemma 4. Therefore, among all the greedy allocationsequences, the best one is for a corresponding crucial instance.Lemma 5 implies that we can solve Best Response by taking each dominated instance asa candidate for a corresponding crucial instance and use GreedyAlg to solve it. We analyze therunning time of this algorithm. Let ( z , z , . . . , z k ) and ( z (cid:48) , z (cid:48) , . . . , z (cid:48) k ) be the position vectorsof the manipulator of two policies π and π (cid:48) . We know that π dominates π (cid:48) if and only if z i ≤ z (cid:48) i holds for any i ∈ { , , . . . , k } . The length of these policies is m . So for i ∈ { , , . . . , k } , thevalue of z (cid:48) i can be any integer between max { z i , z (cid:48) i − + 1 } and m . Combinatorial analyses withsome relaxations can easily establish an upper bound of O ( m k ) for the number of dominatedpolicies. The algorithm to consider all dominated instances is not polynomial when the frequency k of the manipulator in the policy is not a constant.We will use a dynamic programming technique to reduce the number of dominated instancesto a polynomial without losing an optimal solution. To do so, we need the following properties. Our idea is a divide-and-conquer method. We will partition the problem into two subproblems,the first part is to allocate the first i items and the second part is to allocate the remainingitems. We need to find the properties in the first part that keep the invariance of the secondpart. Once we find these properties, we may only need to find the best allocation sequence of thefirst part for the manipulator satisfying these properties (for each fixed allocation sequence forthe second part). In this way, we may be able to use dynamic programming to reduce redundantcases without losing an optimal solution.It is easy to verify that the remaining problems are the same after executing two partialallocation sequences satisfying the following two conditions:1. The number of items allocated to each agent (including the manipulator) is the same;2. The set of items allocated to all the agents is the same.8owever, it is still hard to find all partial allocation sequences satisfying the above twoconditions. In order to get a polynomial-time algorithm, we add the third condition below3. The last item allocated to each non-manipulator is the same. Definition 2 (Invariance Relation)
Two (partial) allocation sequences are in the invariancerelation if they satisfy the above three conditions.
Recall that for an allocation sequence ξ , we use ξ ( i ) to denote the partial allocation sequence ofthe first i allocations. Lemma 6
Let ξ be a feasible allocation sequence and ξ ( i ) ( ≤ i ≤ m ) be a partial allocationsequence. Let ξ (cid:48) ( i ) be another partial allocation sequence that is in the invariance relation with ξ ( i ) . The allocation sequence ξ (cid:48) obtained by replacing ξ ( i ) with ξ (cid:48) ( i ) in ξ is still a feasibleallocation sequence. Since ξ ( i ) and ξ (cid:48) ( i ) are in the invariance relation, we know that we will get the same remainingproblem after executing them. Thus we can exchange ξ ( i ) and ξ (cid:48) ( i ) in larger allocation sequences.The divide-and-conquer idea based on Lemma 6 will be embedded in our dynamic program-ming algorithm. We will see that the algorithm will only split the problem between segments. Equipped with the above properties, we are ready to describe the dynamic programming algo-rithm. The main idea of the algorithm is still based on Lemma 5. However, we will use Lemma 6to reduce the number of subproblems.Recall that m (cid:48) is the number of non-manipulator positions in the policy π . For any integer1 ≤ x ≤ m (cid:48) , let k ( x ) denote the times of the manipulator appearing during the first x segmentsof the policy π , i.e., the period from the beginning of π to the x th position of a non-manipulator.For any dominated instance I (cid:48) , the occurrences of the manipulator in the first x segments of thepolicy in I (cid:48) is at most k ( x ). Recall that π s ( x ) is the partial policy of the first x segments of π .We use pro ( x, y, i , . . . , i n ) to denote the set of all feasible partial allocation sequences satis-fying the following conditions:1. The core of the partial policy associated with the partial allocation sequence is the sameas the core of π s ( x );2. The last allocation in the partial allocation sequence is to allocate an item to a non-manipulator;3. Exactly x items are allocated to non-manipulators and exactly y items are allocated tothe manipulator;4. For each j ∈ { , , . . . , n } , the last item allocated to Agent j is the i j th item in its preferenceranking, where i j can be 0 which means no item is allocated to Agent j ;9. For each r ∈ { , , . . . , x } , during the first r segments at most k ( r ) items are allocated tothe manipulator;6. The partial allocation sequence is a greedy one.The domains of the parameters in pro ( x, y, i , . . . , i n ) are as follows: x ∈ { , , . . . , m (cid:48) } , y ∈ { , , . . . , k } and i , i , . . . , i n ∈ { , , . . . , m } . We may not describe the domains of theparameters when they are clear from the context.Note that even all of x , y and i j are fixed, the set pro ( x, y, i , . . . , i n ) may contain sev-eral different allocation sequences, because the definition does not fix the positions of the y manipulators in the corresponding (partial) policy. We have the following property. Lemma 7
Any two partial allocation sequences in pro ( x, y, i , . . . , i n ) are in the invariancerelation. Lemma 7 can be proved by checking each of the three conditions of the invariance relationone by one, which is not hard and omitted here due to the limited space.We use opt ( x, y, i , . . . , i n ) to denote a partial allocation sequence in pro ( x, y, i , . . . , i n ) wherethe manipulator gets the best solution. Note that pro ( x, y, i , . . . , i n ) is possible to be emptyand for this case we let opt ( x, y, i , . . . , i n ) = ⊥ .The allocation sequence opt ( x, y, i , . . . , i n ) even for x = m (cid:48) may not be a complete allocationsequence of length m , since y may be smaller than k and some allocations to the manipulatorare still left. In fact, opt ( x = m (cid:48) , y, i , . . . , i n ) is a partial allocation sequence only missingthe last part of the allocations corresponding to the trivial segment of the policy. We use opt ∗ ( x = m (cid:48) , y, i , . . . , i n ) to denote the complete allocation sequence obtained from opt ( x = m (cid:48) , y, i , . . . , i n ) plus the k − y allocations of the k − y best remaining items to the manipulator.The following two lemmas will say that the best allocation sequence among all opt ∗ ( x, y, i , . . . , i n )with x = m (cid:48) will lead to the optimal solution to the original instance. Lemma 8
For any y ∈ { , , . . . , k } and i , i , . . . , i n ∈ { , , . . . , m } , if opt ( x = m (cid:48) , y, i , . . . , i n ) (cid:54) = ⊥ ,then opt ∗ ( x = m (cid:48) , y, i , . . . , i n ) is a greedy allocation sequence for an instance dominated by theoriginal instance.Proof. By the definition, we know that opt ( x = m (cid:48) , y, i , . . . , i n ) is a greedy partial allocationsequence. Since opt ∗ ( x = m (cid:48) , y, i , . . . , i n ) is obtained from opt ( x = m (cid:48) , y, i , . . . , i n ) by addingbehind k − y best allocations to the manipulator, we know that opt ∗ ( x = m (cid:48) , y, i , . . . , i n )is also greedy. We consider the policy π ∗ corresponding to the greedy allocation sequence opt ∗ ( x = m (cid:48) , k − z, i , . . . , i n ). By the 5th item in the definition of pro ( x, y, i , . . . , i n ), we knowthat for each r ∈ { , , . . . , x } , during the first r segments at most k ( r ) items are allocated tothe manipulator. This means π ∗ is dominated by the original policy π . Lemma 9
Let I c be a crucial instance corresponding to I , where the trivial segment in I c consistsof z occurrences of the manipulator (0 ≤ z ≤ k ) . Assume that in the optimal solution to I c ,for each j ∈ { , , . . . , n } , the last item allocated to Agent j is the a j th item in its preference anking. Then opt ∗ ( m (cid:48) , k − z, a , . . . , a n ) leads to the optimal solution to the original instance I .Proof. The greedy allocation sequence S to I c , also leading to an optimal solution to the originalinstance I , is a candidate for opt ∗ ( m (cid:48) , y, a , . . . , a n ). On the other hand, by Lemmas 8 and 1,we know that opt ∗ ( m (cid:48) , k − z, a , . . . , a n ) is not better than S . Since opt ∗ ( m (cid:48) , k − z, a , . . . , a n )is chosen as the best one, we know that opt ∗ ( m (cid:48) , k − z, a , . . . , a n ) is as good as S .We can not directly compute an optimal solution to the original instance I according toLemma 9, since we do not know the values of a j in Lemma 9. However, by Lemma 5, Lemma 8and Lemma 9, we know that the best one among all opt ∗ ( x, y, i , . . . , i n ) with x = m (cid:48) will get anoptimal solution to the original instance I . So our algorithm contains the following three mainsteps.1 Compute all opt ( x, y, i , . . . , i n ) by calling the subalgorithm OPT;2 Compute all opt ∗ ( x, y, i , . . . , i n ) with x = m (cid:48) from opt ( m (cid:48) , y, i , . . . , i n );3 Find the best one among all opt ∗ ( x, y, i , . . . , i n ) with x = m (cid:48) .The subalgorithm OPT in Step 1 is a dynamic programming algorithm that compute all opt ( x, y, i , . . . , i n ) in an order with a nonincreasing value of x . Before presenting the wholeprocedure of OPT, we introduce the idea in the algorithm.Assume that all opt ( x (cid:48) , y, i , . . . , i n ) for x (cid:48) < x have been computed. We use the followingidea to compute opt ( x, y, i , . . . , i n ).We use r to denote the non-manipulator on x th position of the core of π , i.e., the x thnon-manipulator in π is Agent r . Assume that opt ( x, y, i , . . . , i n ) (cid:54) = ⊥ . Let π x be the policycorresponding to opt ( x, y, i , . . . , i n ). We further assume that the last segment of π x consists of q occurrences of the manipulator and one occurrence of Agent r . Then opt ( x, y, i , . . . , i n ) isgiven by the allocations L of items to the first x − π x plus the allocations L ofitems to the last segment of π x .Since we require that the allocation sequence in opt ( x, y, i , . . . , i n ) is greedy, we know that L is given by q allocations of the first q remaining items on Agent r ’s preference ranking tothe manipulator plus one allocation of the ( q + 1)th remaining item to Agent r . Furthermore,the last item allocated to Agent r must be the i r th item in Agent r ’s preference ranking. ByLemma 6, Lemma 7 and the fact that the utility function is additive, we know that L is givenby opt ( x − , y − q, i , . . . , i ∗ r , . . . , i n ) for some i (cid:48) r ≤ i r − ( q + 1).However, we do not know the value of q and i ∗ r . In the algorithm, we try all possible valuesfor q and i ∗ r . Lemma 5 can guarantee that the best one among them is the correct allocationsequence we are seeking for. The whole procedure of OPT is presented in Algorithm 1.Next, we analyze the running time of the whole algorithm. The algorithm contains threesteps.The first step is to compute opt ( x, y, i , . . . , i n ) for i , i , . . . , i n ∈ { , , . . . , m } , x ∈ { , , . . . , m (cid:48) = m − k } and y ∈ { , , . . . , k } . In total, there are (1 + m ) n − ( m − k + 1)( k + 1) < (1 + m ) n +1 lgorithm 1: Subalgorithm OPT
Input:
An instance I = ( O, N, π, {(cid:31) i } ni =1 ) of Best Response . Output:
To compute opt ( x, y, i , . . . , i n ) for all x ∈ { , , . . . , m (cid:48) } , y ∈ { , , . . . , k } and i , i , . . . , i n ∈ { , , . . . , m } . for all values of x, y, i , . . . , i n , do opt ( x, y, i , . . . , i n ) ←⊥ ; opt (0 , , , . . . , ← ∅ , which is empty but feasible; for x = 1 to m (cid:48) do Let Agent r be the non-manipulator on the x th position of the core of π ; for all i , i , . . . , i n ∈ { , , . . . , m } and ≤ y ≤ k ( x ) , do for q ∈ { , , . . . , k ( x ) } do if There is a value i ∗ r ≤ i r − ( q + 1) such that opt ( x − , y − q, i , . . . , i ∗ r , . . . , i n ) (cid:54) = ⊥ , and after executing opt ( x − , y − q, i , . . . , i ∗ r , . . . , i n ) , the ( q + 1) th remaining item on Agent r ’spreference ranking is exactly the i r th item on the whole preference ranking ofAgent r , then Let opt (cid:48) be opt ( x − , y − q, i , . . . , i ∗ r , . . . , i n ) plus q allocations of the first q remaining items on Agent r ’s preference ranking to the manipulator andone allocation of the ( q + 1)th remaining item to Agent r ; Let opt ( x, y, i , . . . , i n ) be the best of opt (cid:48) and current opt ( x, y, i , . . . , i n );subproblems need to be solved. For each subproblem opt ( x, y, i , . . . , i n ) with x >
0, we computethem from Steps 3 to 9 in OPT. In Step 6, we have k ( x ) + 1 loops. For each loop, we may useat most O ( i t m ) time. So for each subproblem opt ( x, y, i , . . . , i n ), our algorithm uses at most O ( m ) time. OPT runs in O ((1 + m ) n +4 ) time.Step 2 takes at most O ((1 + m ) n +2 ) time to extend all opt ( x = m (cid:48) , y, i , . . . , i n ) to opt ∗ ( x = m (cid:48) , y, i , . . . , i n ).Step 3 is to find the best one among all opt ∗ ( x = m (cid:48) , y, i , . . . , i n ), which can be done in O ((1 + m ) n +1 ) time. Theorem 10
Best Response can be solved in O ((1 + m ) n +4 ) time. For each constant number n of agents, Best Response is polynomially solvable. . -Approximation Algorithm Although for each fixed number of agents, the manipulating sequential allocation problem canbe solved in polynomial time, the running time is exponential in the number n of agents. When n is large, the algorithm will still be slow. So we also consider approximation algorithm for theproblem. We prove that 12 heorem 11 For any instance of
Best Response with additive utility functions, if the ma-nipulator takes the truthful preference ranking as its picking strategy, it can get a bundle withthe total utility being at least half of that of the optimal solution.Proof.
Let I (cid:48) be the corresponding crucial instance of the input instance I . By Corollary 1, weknow that an optimal solution to I (cid:48) is an optimal solution to I . By Lemma 1, we also know thata solution to I is at least as good as that to I (cid:48) under the same picking strategy. So we only needto prove the theorem holds for crucial instance I (cid:48) and next we assume that the input instanceis crucial.We use ξ A and ξ B to denote the allocations by taking the optimal picking strategy and bytaking the truthful preference as the picking strategy, respectively. Let A = { a , a , . . . , a k } bethe bundle obtained by ξ A and B = { b , b , . . . , b k } be the bundle obtained by ξ B , where weassume that the items in the above two sets are listed according to the picking order.We first prove that for any index 1 ≤ i ≤ k such that a i (cid:31) b i , the item a i is also in B .Assume to the contrary that a i (cid:54)∈ B , which means that item a i is not taken into the solutionin ξ B . The allocation of item a i in ξ A and the allocation of item b i in ξ B happen as the sameposition of the policy, say the x th position. In ξ B , an item b i with u ( b i ) < u ( a i ) is allocated atthe x th position, which means that a i has already been allocated to some agent before the x thposition in ξ B as the picking strategy in ξ B is the truthful preference. However, the instance isa crucial instance and the optimal allocation sequence in ξ A is greedy. Item a i is impossible tobe allocated to a non-manipulator before the x th position in ξ B . Then a i can only be allocatedto the manipulator in ξ B , which is a contradiction to the assumption that a i (cid:54)∈ B . So the aboveclaim holds.Let L = { i , i , . . . , i l } be the set of indices i j such that a i j (cid:31) b i j . Let L = { , , . . . , k } \ L .Note that L is not empty and index 1 is always in L . By the above claim, we have that (cid:88) i ∈ L u ( a i ) < (cid:88) b ∈ B u ( b ) . By the definitions of L and L , we have that (cid:88) i ∈ L u ( a i ) ≤ (cid:88) i ∈ L u ( b i ) ≤ (cid:88) b ∈ B u ( b ) . By summing up the above two inequalities, we get that (cid:88) a ∈ A u ( a ) < (cid:88) b ∈ B u ( b ) . We also give a simple example, where the approximation ratio cannot be 0 . (cid:15) (cid:48) for anyconstant (cid:15) (cid:48) >
0. This will show the approximation ratio of 0 . O = { g , g , g } to be allocated to twoagents N = { , } . The preference rankings are (cid:31) : g , g , g and (cid:31) : g , g , g . The policy is13 : 121. The utility function of the manipulator is that u ( g ) = 1, u ( g ) = 1 − (cid:15) and u ( g ) = (cid:15) .If the manipulator use the picking strategy g g g , it can get items g and g with the utility2 − (cid:15) . If the manipulator use the truthful preference ranking g g g as the picking strategy, itcan only get items g and g with the utility 1 + (cid:15) . The approximation ratio is (cid:15) − (cid:15) = 0 . . (cid:15) − (cid:15) ,where . (cid:15) − (cid:15) can be arbitrarily small. Best Response can be regarded as one of the hardest natural problems in manipulating se-quential allocation problems, since most other problems can be reduced to it. It has been knownfor years that
Best Response with only two agents can be solved in polynomial time. However,it took more effort to establish the NP-hardness of
Best Response with an unbounded numberof agents. In this paper, we complete the “gap” by showing that
Best Response is polynomi-ally solvable for any constant number of agents. Furthermore, we show that we can always geta 0.5-approximation solution if taking the truthful preference ranking of the manipulator as itspicking strategy. Furthermore, the ratio 0.5 is tight as far as using the truthful preference rank-ing. It may be interesting to consider the approximation ratio for using the truthful responseon more no-strategyproof problems.
References [1] Abdulkadiroglu, A., Pathak, P., Roth, A.E., Sonmez, T.: Changing the boston school choicemechanism. Tech. rep., National Bureau of Economic Research (2006)[2] Aziz, H., Bouveret, S., Lang, J.: Manipulating sequential allocation: an overview (2017),online manuscript[3] Aziz, H., Bouveret, S., Lang, J., Mackenzie, S.: Complexity of manipulating sequentialallocation. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence,February 4-9, 2017, San Francisco, California, USA. pp. 328–334 (2017)[4] Aziz, H., Gaspers, S., Mackenzie, S., Mattei, N., Narodytska, N., Walsh, T.: Manipulatingthe probabilistic serial rule. In: Proceedings of the 2015 International Conference on Au-tonomous Agents and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015.pp. 1451–1459 (2015)[5] Aziz, H., Goldberg, P., Walsh, T.: Equilibria in sequential allocation. In: AlgorithmicDecision Theory - 5th International Conference, ADT 2017, Luxembourg, Luxembourg,October 25-27, 2017, Proceedings. pp. 270–283 (2017)[6] Aziz, H., Walsh, T., Xia, L.: Possible and necessary allocations via sequential mechanisms.In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelli-gence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015. pp. 468–474 (2015)147] Bouveret, S., Lang, J.: A general elicitation-free protocol for allocating indivisible goods.In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intel-ligence, Barcelona, Catalonia, Spain, July 16-22, 2011. pp. 73–78 (2011)[8] Bouveret, S., Lang, J.: Manipulating picking sequences. In: ECAI 2014 - 21st EuropeanConference on Artificial Intelligence, 18-22 August 2014, Prague, Czech Republic - IncludingPrestigious Applications of Intelligent Systems (PAIS 2014). pp. 141–146 (2014)[9] Brams, S.J., Taylor, A.D.: The win-win solution - guaranteeing fair shares to everybody.W. W. Norton & Company (2000)[10] Budish, E., Cantillon, E.: The multi-unit assignment problem: Theory and evidence fromcourse allocation at harvard. THE AMERICAN ECONOMIC REVIEW (5), 2237–2271(2012)[11] Cechl´arov´a, K., Klaus, B., Manlove, D.F.: Pareto optimal matchings of students to coursesin the presence of prerequisites. Discrete Optimization , 174–195 (2018)[12] Kalinowski, T., Narodytska, N., Walsh, T.: A social welfare optimal sequential allocationprocedure. In: IJCAI 2013, Proceedings of the 23rd International Joint Conference onArtificial Intelligence, Beijing, China, August 3-9, 2013. pp. 227–233 (2013)[13] Kohler, D.A., Chandrasekaran, R.: A class of sequential games. Operations Research (2),270–277 (1971)[14] Kojima, F., ¨Unver, M.U.: The “boston” school-choice mechanism: an axiomatic approach.Economic Theory (3), 515–544 (Apr 2014)[15] Levine, L., Stange, K.E.: How to make the most of a shared meal: Plan the last bite first.The American Mathematical Monthly119