[PDF] Minimizing Time-to-Rank: A Learning and Recommendation Approach

Abstract

Full PDF

MMinimizing Time-to-Rank:

A Learning and Recommendation Approach

Haoming Li a , Sujoy Sikdar b , Rohit Vaish c , Junming Wang d , Lirong Xia e , and Chaonan Ye fa Duke University, [email protected] b Rensselaer Polytechnic Institute, [email protected] c Rensselaer Polytechnic Institute, [email protected] d Rensselaer Polytechnic Institute, [email protected] e Rensselaer Polytechnic Institute, [email protected] f Stanford University, [email protected]

May 30, 2019

Abstract

Consider the following problem faced by an online voting platform: A user is provided witha list of alternatives, and is asked to rank them in order of preference using only drag-and-dropoperations. The platform’s goal is to recommend an initial ranking that minimizes the time spentby the user in arriving at her desired ranking. We develop the first optimization framework toaddress this problem, and make theoretical as well as practical contributions. On the practicalside, our experiments on Amazon Mechanical Turk provide two interesting insights about userbehavior: First, that users’ ranking strategies closely resemble selection or insertion sort, and second,that the time taken for a drag-and-drop operation depends linearly on the number of positionsmoved. These insights directly motivate our theoretical model of the optimization problem. Weshow that computing an optimal recommendation is NP-hard, and provide exact and approximationalgorithms for a variety of special cases of the problem. Experimental evaluation on MTurk showsthat, compared to a random recommendation strategy, the proposed approach reduces the (average)time-to-rank by up to . Eliciting preferences in the form of rankings over a set of alternatives is a common task in social choice,crowdsourcing, and in daily life. For example, the organizer of a meeting might ask the participants torank a set of time-slots based on their individual schedules. Likewise, in an election, voters might berequired to rank a set of candidates in order of preference.Over the years, computerized systems have been increasingly used in carrying out preference elici-tation tasks such as the ones mentioned above. Indeed, recently there has been a proliferation of onlinevoting platforms such as CIVS, OPRA, Pnyx, RoboVote, and Whale . In many of these platforms, auser is presented with an arbitrarily ordered list of alternatives, and is asked to shuffle them aroundin-place using drag-and-drop operations until her desired preference ordering is achieved. Figure 1illustrates the use of drag-and-drop operations in sorting a given list of numbers.Our focus in this work is on time-to-rank , i.e., the time it takes for a user to arrive at her desiredranking, starting from a ranking suggested by the platform and using only drag-and-drop operations.We study this problem from the perspective of the voting platform that wants to recommend an optimal CIVS (https://civs.cs.cornell.edu/), OPRA (opra.io), Pnyx (https://pnyx.dss.in.tum.de/), RoboVote (http://robovote.org/),Whale (https://whale.imag.fr/). a r X i v : . [ c s . H C ] M a y igure 1: Sorting via drag-and-drop operations. initial ranking to the user (i.e., one that minimizes time-to-rank). Time to accomplish a designated taskis widely considered as a key consideration in the usability of automated systems (Bevan et al., 2015;Albert and Tullis, 2013), and serves as a proxy for user effort. Indeed, ‘time on task’ was identified asa key factor in the usability and efficiency of computerized voting systems in a 2004 report by NIST tothe U.S. Congress for the Help America Vote Act (HAVA) (Laskowski et al., 2004). In crowdsourcing,too, time on task plays a key role in the recruitment of workers, quality of worker participation, andin determining payments (Cheng et al., 2015; Maddalena et al., 2016).Note that the initial ranking suggested by the platform can have a significant impact on the timespent by the user on the ranking task. Indeed, if the user’s preferences are known beforehand, thenthe platform can simply recommended it to her and she will only need to verify that the orderingis correct. In practice, however, users’ preferences are often unknown . Furthermore, users employ awide variety of ranking strategies , and based on their proficiency with the interface, users can havevery different drag-and-drop times . All these factors make the task of predicting the time-to-rank andfinding an optimal recommendation challenging and non-trivial.We emphasize the subtle difference between our problem and that of preference elicitation . Thelatter involves repeatedly asking questions to the users (e.g., in the form of pairwise comparisonsbetween alternatives) to gather enough information about their preferences. By contrast, our probleminvolves a one-shot recommendation followed by a series of drag-and-drop operations by the useruntil her desired ranking is achieved. There is an extensive literature on preference eliciation (Conenand Sandholm, 2001; Conitzer and Sandholm, 2002; Blum et al., 2004; Boutilier, 2013; Busa-Fekete et al.,2014; Soufiani et al., 2013; Zhao et al., 2018). Yet, somewhat surprisingly, the problem of recommendinga ranking that minimizes users’ time and effort has received little attention. Our work aims to addressthis gap.

Our Contributions

We make contributions on three fronts:• On the conceptual side, we propose the problem of minimizing time-to-rank and outline a frame-work for addressing it (Figure 2).• On the theoretical side, we formulate the optimization problem of finding a recommendationto minimize time-to-rank (Section 4). We show that computing an optimal recommendation isNP-hard, even under highly restricted settings (Theorem 3). We complement the intractabilityresults by providing a number of exact (Theorem 2) and approximation algorithms (Theorems 4to 6) for special cases of the problem.• We use experimental analysis for the dual purpose of motivating our modeling assumptions aswell as justifying the effectiveness of our approach (Section 5). Our experiments on AmazonMechanical Turk reveal two insights about user behavior (Section 5.1): (1) The ranking strategiesof real-world users closely resemble insertion/selection sort , and (2) the drag-and-drop time of analternative varies linearly with the distance moved. Additionally, we find that a simple adaptivestrategy (based on the Borda count voting rule) can reduce time-to-rank by up to comparedto a random recommendation strategy (Section 5.2), validating the usefulness of the proposedframework. 2 igure 2:

High level overview of our framework. Our technical contributions are highlighted in blue.

Figure 2 illustrates the proposed framework which consists of three key steps. In Step 1, we learnuser preferences from historical data by developing a statistical ranking model, typically in the formof a distribution D over the space of all rankings (refer to Section 2 for examples of ranking models).In Step 2, which runs in parallel to Step 1, we learn user behavior ; in particular, we identify their sorting strategies (Section 3.1) as well as their drag-and-drop times (Section 3.2). Together, these twocomponents define the time function which models the time taken by a user in transforming a giveninitial ranking σ into a target ranking τ , denoted by time ( σ, τ ) . The ranking model D from Step1 and the time function from Step 2 together define the recommendation problem in Step 3, called ( D , w ) -Recommendation (the parameter w is closely related to the time function; we elaborate onthis below). This is the optimization problem of computing a ranking σ that minimizes the expectedtime-to-rank of the user, i.e., minimizes E τ ∼D [ time ( σ, τ )] . The user is then recommended σ , and herpreference history is updated.The literature on learning statistical ranking models is already well-developed (Guiver and Snelson,2009; Awasthi et al., 2014; Lu and Boutilier, 2014; Zhao et al., 2016). Thus, while this is a key ingredientof our framework (Step 1), in this work we choose to focus on Steps 2 and 3, namely, learning userbehavior and solving the recommendation problem.Recall that the time function defines the time taken by a user in transforming a given ranking σ intoa target ranking τ . For a user who follows a fixed sorting algorithm (e.g., insertion or selection sort),the time function can be broken down into (1) the number of drag-and-drop operations suggested bythe sorting algorithm, and, (2) the (average) time taken for each drag-and-drop operation by the user.As we will show in Lemma 1 in Section 3.1, point (1) above is independent of the choice of the sortingalgorithm. Therefore, the time function can be equivalently defined in terms of the weight function w ,which describes the time taken by a user, denoted by w ( (cid:96) ) , in moving an alternative by (cid:96) positions viaa drag-and-drop operation. For this reason, we use w in the formulation of ( D , w ) -Recommendation. Applicability

Our framework is best suited for the users who have already formed their preferences,so that the recommended ranking does not bias their preferences. This is a natural assumption insome applications, such as in the meeting organization example in Section 1. In general, however, it ispossible that a user, who is undecided between options A and B , might prefer A over B if presentedin that order by the recommended ranking. A careful study of such biases (aka “framing effect”) is aninteresting direction for future work. Additional Related Work

Our work is related to the literature on inferring a ground truth orderingfrom noisy information (Braverman and Mossel, 2008), and aggregating preferences by minimizingsome notion of distance to the observed rankings such as the total Kendall’s Tau distance (Procacciaand Shah, 2016). Previous work on preference learning and learning to rank can also be integrated inour framework (Liu, 2011; Lu and Boutilier, 2014; Khetan and Oh, 2016; Agarwal, 2016; Negahban et al.,2017; Zhao and Xia, 2018). 3

Preliminaries

Let A = { a , . . . , a m } denote a set of m alternatives , and let L ( A ) be the set of all linear orders over A .For any σ ∈ L ( A ) , a i (cid:31) σ a j denotes that a i is preferred over a j under σ , and let σ ( k ) denote the k th most preferred alternative in σ . A set of n linear orders { σ (1) , . . . , σ ( n ) } is called a preference profile . Definition 1 (Kendall’s Tau distance; Kendall, 1938) . Given two linear orders σ, σ (cid:48) ∈ L ( A ) , the Kendall’sTau distance d kt ( σ, σ (cid:48) ) is the number of pairwise disagreements between σ and σ (cid:48) . That is, d kt ( σ, σ (cid:48) ) := (cid:80) a i ,a j ∈ A [ a j (cid:31) σ (cid:48) a i and a i (cid:31) σ a j ] , where is the indicator function. Definition 2 (Plackett-Luce model; Plackett, 1975; Luce, 1959) . Let θ := ( θ , . . . , θ m ) be such that θ i ∈ (0 , for each i ∈ [ m ] and (cid:80) i ∈ [ m ] θ i = 1 . Let Θ denote the corresponding parameter space. ThePlackett-Luce ( PL ) model parameterized by θ ∈ Θ defines a distribution over the set of linear orders L ( A ) as follows: The probability of generating σ := ( a i (cid:31) a i (cid:31) . . . (cid:31) a i m ) is given by Pr( σ | θ ) = θ i (cid:80) m(cid:96) =1 θ i(cid:96) · θ i (cid:80) m(cid:96) =2 θ i(cid:96) · · · · · θ im − θ im − + θ im . More generally, a k -mixture Plackett-Luce model ( k -PL ) is parameterized by { γ ( (cid:96) ) , θ ( (cid:96) ) } k(cid:96) =1 , where (cid:80) k(cid:96) =1 γ ( (cid:96) ) = 1 , γ ( (cid:96) ) ≥ for all (cid:96) ∈ [ k ] , and θ ( (cid:96) ) ∈ Θ for all (cid:96) ∈ [ k ] . The probability of generating σ ∈ L ( A ) is given by Pr( σ |{ γ ( (cid:96) ) , θ ( (cid:96) ) } k(cid:96) =1 ) = (cid:80) k(cid:96) =1 γ ( (cid:96) ) Pr( σ | θ ( (cid:96) ) ) . Definition 3 (Mallows model; Mallows, 1957) . The Mallows model ( MM ) is specified by a referenceranking σ ∗ ∈ L ( A ) and a dispersion parameter φ ∈ (0 , . The probability of generating a ranking σ isgiven by Pr( σ | σ ∗ , φ ) = φ d kt ( σ,σ ∗ ) Z , where Z = (cid:80) σ (cid:48) ∈L ( A ) φ d kt ( σ (cid:48) ,σ ∗ ) .More generally, a k -mixture Mallows model ( k -MM ) is parameterized by { γ ( (cid:96) ) , σ ∗ ( (cid:96) ) , φ ( (cid:96) ) } k(cid:96) =1 , where (cid:80) k(cid:96) =1 γ ( (cid:96) ) = 1 , γ ( (cid:96) ) ≥ for all (cid:96) ∈ [ k ] , and σ ∗ ( (cid:96) ) ∈ L ( A ) , φ ( (cid:96) ) ∈ (0 , for all (cid:96) ∈ [ k ] . The probability ofgenerating σ ∈ L ( A ) is given by Pr( σ |{ γ ( (cid:96) ) , σ ∗ ( (cid:96) ) , φ ( (cid:96) ) } k(cid:96) =1 ) = (cid:80) k(cid:96) =1 γ ( (cid:96) ) Pr( σ | σ ∗ ( (cid:96) ) , φ ( (cid:96) ) ) . Definition 4 (Uniform distribution) . Under the uniform distribution ( Unif ) supported on a preferenceprofile { σ ( i ) } ni =1 , the probability of generating σ ∈ L ( A ) is n if σ ∈ { σ ( i ) } ni =1 and otherwise. In this section, we will model the time spent by the user in transforming the recommended ranking σ into the target ranking τ . Our formulation involves the sorting strategy of the user (Section 3.1) as wellas her drag-and-drop time (Section 3.2). A sorting algorithm takes as input a ranking σ ∈ L ( A ) and performs a sequence of drag-and-drop operations until the target ranking is achieved. At each step, an alternative is moved from its cur-rent position to another (possibly different) position and the current ranking is updated accordingly.Below we will describe two well-known examples of sorting algorithms: selection sort and insertionsort . Let σ ( k ) denote the current list at time step k ∈ { , , . . . } (i.e., before the sorting operation attime step k takes place). Thus, σ (1) = σ . For any σ ∈ L ( A ) , define the k -prefix set of σ as P k ( σ ) := { σ (1) , σ (2) , . . . , σ ( k ) } (where P ( σ ) := ∅ ) and corresponding suffix set as S k ( σ ) := A \ P k ( σ ) . Selection Sort

Let a i denote the most preferred alternative according to τ in the set S k − ( σ ( k ) ) . Atstep k of selection sort, the alternative a i is promoted to a position such that the top k alternatives inthe new list are ordered according to τ . Note that this step is well-defined only under the sorted-prefixproperty , i.e., at the beginning of step k of the algorithm, the alternatives in P k − ( σ ( k ) ) are sortedaccording to τ . This property is maintained by selection sort.4 nsertion Sort Let a i denote the most preferred alternative in S k − ( σ ( k ) ) according to σ ( k ) . At step k of insertion sort, the alternative a i is promoted to a position such that the top k alternatives in thenew list are ordered according to τ . Note that this step is well-defined only under the sorted-prefixproperty, which is maintained by insertion sort. Sorting Algorithms

In this work, we will be concerned with sorting algorithms that involve a com-bination of insertion and selection sort. Specifically, we will use the term sorting algorithm to refer to asequence of steps s , s , . . . such that each step s k corresponds to either selection or insertion sort, i.e., s k ∈ { SEL,INS } for every k . If s k = SEL, then the algorithm promotes the most preferred alternativein S k − ( σ ( k ) ) (according to τ ) to a position such that the top k alternatives in the new list are orderedaccording to τ . If s k = INS, then the algorithm promotes the most preferred alternative in S k − ( σ ( k ) ) (according to σ ( k ) ) to a position such that the top k alternatives in the new list are ordered accordingto τ .For example, in Figure 1, starting from the recommended list at the extreme left, the user performsa selection sort operation (promoting 19 to the top of the current list) followed by an insertion sort operation (promoting 30 to its correct position in the sorted prefix { , , } ) followed by eitherselection or insertion sort operation (promoting 23 to its correct position). We will denote a genericsorting algorithm by A and the class of all sorting algorithms by A . Count Function

Given a sorting algorithm A , a source ranking σ ∈ L ( A ) and a target ranking τ ∈ L ( A ) , the count function f σ → τ A : [ m − → Z + ∪ { } keeps track of the number of drag-and-dropoperations (and the number of positions by which some alternative is moved in each such operation)during the execution of A . Formally, f σ → τ A ( (cid:96) ) is the number of times some alternative is ‘moved upby (cid:96) positions’ during the execution of algorithm A when the source and target rankings are σ and τ respectively. For example, let A be insertion sort, σ = ( d, c, a, b ) , and τ = ( a, b, c, d ) . In step 1, theuser considers the alternative d and no move-up operation is required. In step 2, the user promotes c by one position (since c (cid:31) τ d ) to obtain the new list ( c, d, a, b ) . In step 3, the user promotes a bytwo positions to obtain ( a, c, d, b ) . Finally, the user promotes b by two positions to obtain the targetlist ( a, b, c, d ) . Overall, the user performs one ‘move up by one position’ operation and two ‘move upby two positions’ operations. Hence, f σ → τ A (1) = 1 , f σ → τ A (2) = 2 , and f σ → τ A (3) = 0 . We will write A ,i.e., = (cid:80) m − (cid:96) =1 f σ → τ A ( (cid:96) ) . Remark 1.

Notice the difference between the number of drag-and-drop operations ( = 3 ), but the total distance moved is .The latter quantity is equal to d kt ( σ, τ ) . Lemma 1.

For any two sorting algorithms A , A (cid:48) ∈ A , any σ, τ ∈ L ( A ) , and any (cid:96) ∈ [ m − , f σ → τ A ( (cid:96) ) = f σ → τ A (cid:48) ( (cid:96) ) . In light of Lemma 1, we will hereafter drop the subscript A and simply write f σ → τ instead of f σ → τ A .The proof of Lemma 1 appears in Section 7.3. Weight function

The weight function w : [ m − → R ≥ models the time taken for each drag-and-drop operation; specifically, w ( (cid:96) ) denotes the time taken by the user in moving an alternative upby (cid:96) positions. Of particular interest to us will be the linear weight function w lin ( (cid:96) ) = (cid:96) for each Notice that we do not keep track of which alternative is moved by (cid:96) positions. Indeed, we believe it is reasonable toassume that moving the alternative a up by (cid:96) positions takes the same time as it will for a . Also, we do not need to definethe count function for move down operations as neither selection sort nor insertion sort will ever make such a move. Here, ‘time taken’ includes the time spent in thinking about which alternative to move as well as actually carrying out the move. istribution D Linear Weights General Weights

Hardness Exact Algo. Approx. Algo. Approx. Algo. k -mixturePlackett-Luce ( k -PL) NP-c even for k = 4 (Theorem 3) Poly for k = 1 (Theorem 2) PTAS (Theorem 4) -approx. (Theorem 5) αβ -approx.(Theorem 6) k -mixture Mallows( k -MM) NP-c even for k = 4 (Theorem 3) Poly for k = 1 (Theorem 2) PTAS (Theorem 4) -approx. (Theorem 5) αβ -approx.(Theorem 6)Uniform (Unif) NP-c even for n = 4 (Theorem 3) Poly for n ∈ { , } (Theorem 2) PTAS (Theorem 4) -approx. (Theorem 5) αβ -approx.(Theorem 6) Table 1:

Computational complexity results for ( D , w ) -Recommendation. Each row corresponds to a preferencemodel and each column corresponds to a weight function. We use the shorthands Poly , NP-c , PTAS , and αβ -approx. to denote polynomial-time (exact) algorithm, NP-complete , polynomial-time approximation scheme, and αβ -approximation algorithm respectively. The parameters α and β capture how closely a given weight functionapproximates a linear weight function; see Definition 6. (cid:96) ∈ [ m − and the affine weight function w aff ( (cid:96) ) = c(cid:96) + d for each (cid:96) ∈ [ m − and fixed constants c, d ∈ N . Time Function

Given the count function f σ → τ and the weight function w , the time function isdefined as their inner product, i.e., time w ( σ, τ ) = (cid:104) f σ → τ , w (cid:105) = (cid:80) m − (cid:96) =1 f σ → τ ( (cid:96) ) · w ( (cid:96) ) . Theorem 1 shows that for the linear weight function w lin , time is equal to the Kendall’s Tau dis-tance, and for the affine weight function, time is equal to a weighted combination of Kendall’s Taudistance and the total number of moves. Theorem 1.

For any σ, τ ∈ L ( A ) , time w lin ( σ, τ ) = d kt ( σ, τ ) and time w aff ( σ, τ ) = c · d kt ( σ, τ ) + d · The proof of Theorem 1 appears in Section 7.4.

We model the recommendation problem as the following computational problem: Given the preferencedistribution D of the user and her time function (which, in turn, is determined by the weight function w ), find a ranking that minimizes the expected time taken by the user to transform the recommendedranking σ into her preference τ . Definition 5 ( ( D , w ) -Recommendation) . Given a distribution D over L ( A ) , a weight function w , anda number δ ∈ Q , does there exist σ ∈ L ( A ) so that E τ ∼D [ time w ( σ, τ )] ≤ δ ? We will focus on settings where the distribution D is Plackett-Luce, Mallows, or Uniform, andthe weight function w is Linear, Affine, or General. Note that if the quantity E τ ∼D [ time w ( σ, τ )] can be computed in polynomial time for a given distribution D and weight function w , then ( D , w ) -Recommendation is in NP.Our computational results for ( D , w ) -Recommendation are summarized in Table 1. We showthat this problem is NP-hard, even when the weight function is linear (Theorem 3). On the algorithmicside, we provide a polynomial-time approximation scheme (PTAS) and a -approximation algorithmfor the linear weight function (Theorems 4 and 5), and an approximation scheme for non-linear weights(Theorem 6). Theorem 2 ( Exact Algorithms ) . ( D , w ) -Recommendation is solvable in polynomial time when w is linear and D is either (a) k -mixture Plackett-Luce ( k -PL ) with k = 1 , (b) k -mixture Mallows model ( k -MM ) with k = 1 , or (c) a uniform distribution with support size n ≤ . heorem 3 ( Hardness results ) . ( D , w ) -Recommendation is NP-complete even when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k = 4 , (b) k -mixture Mallows model ( k -MM ) for k = 4 , or (c) a uniform distribution over n = 4 linear orders. Theorem 4 ( PTAS ) . ( D , w ) -Recommendation admits a polynomial time approximation scheme ( PTAS ) when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k ∈ N , (b) k -mixture Mal-lows model ( k -MM ) for k ∈ N , or (c) a uniform distribution ( Unif ) . The PTAS in Theorem 4 is quite complicated and is primarily of theoretical interest (indeed, forany fixed ε > , the running time of the algorithm is m ˜ O (1 /ε ) , making it difficult to be applied inexperiments). A simpler and more practical algorithm (although with a worse approximation) is basedon the well-known Borda count voting rule (Theorem 5). Theorem 5 ( -approximation ) . ( D , w ) -Recommendation admits a polynomial time -approximationalgorithm when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k ∈ N , (b) k -mixture Mallows model ( k -MM ) for k ∈ N , or (c) a uniform distribution ( Unif ) . Our next result (Theorem 6) provides an approximation guarantee for ( D , w ) -Recommendationthat applies to non-linear weight functions, as long as they are “close” to the linear weight function inthe following sense: Definition 6 (Closeness-of-weights) . A weight function w is said to be ( α, β ) -close to another weightfunction w (cid:48) if there exist α, β ≥ such that for every (cid:96) ∈ [ m − , we have w (cid:48) ( (cid:96) ) /β ≤ w ( (cid:96) ) ≤ α w (cid:48) ( (cid:96) ) . For any (possibly non-linear) weight function w that is ( α, β ) close to the linear weight function w lin , Theorem 6 provides an αβ -approximation scheme for ( D , w ) -Recommendation. Theorem 6 (Approximation for general weights) . Given any ε > and any weight function w that is ( α, β ) -close to the linear weight function w lin , there exists an algorithm that runs in time m ˜ O (1 /ε ) andreturns a linear order σ such that E τ ∼D [ time w ( σ, τ )] ≤ αβ (1 + ε ) E τ ∼D [ time w ( σ ∗ , τ )] , where σ ∗ ∈ arg min σ (cid:48) ∈L ( A ) E τ ∼D [ time w ( σ (cid:48) , τ )] . Remark 2.

Notice that the

PTAS of Theorem 4 is applicable for any affine weight function w aff = c · w lin + d for some fixed constants c, d ∈ N . As a result, the approximation guarantee of Theorem 6 alsoextends to any weight function that is ( α, β ) -close to some affine weight function. We perform two sets of experiments on Amazon Mechanical Turk (MTurk). The first set of experi-ments (Section 5.1) is aimed at identifying the sorting strategies of the users as well as a model of their drag-and-drop behavior . The observations from these experiments directly motivate the formulationof our theoretical model, which we have already presented in Section 4. The second set of experiments(Section 5.2) is aimed at evaluating the practical usefulness of our approach.In both sets of experiments, the crowdworkers were asked to sort in increasing order a randomlygenerated list of numbers between 0 and 100 (the specifics about the length of the lists and how theyare generated can be found in Sections 5.1 and 5.2). Figure 3 shows an example of the instructionsprovided to the crowdworkers.In each experiment, the task length was advertised as 10 minutes, and the payment offered was $0 . per task. The crowdworkers were provided a user interface (see Figure 1) that allows for drag-and-drop operations. To ensure data quality, we removed those workers from the data who failed tosuccessfully order the integers more than 80% of the time, or did not complete all the polls. We alsoremoved the workers with high variance in their sorting time; in particular, those with coefficient ofvariation above the th percentile. The reported results are for the workers whose data was retained.7 igure 3: Instructions given to the MTurk workers.

To identify user behavior, we performed two experiments: (a) Rank10, where each crowdworker par-ticipated in 20 polls, each consisting of a list of 10 integers (between and ) generated uniformlyat random, and (b) Rank5, which is a similar task with 30 polls and lists of length 5. In each poll,we recorded the time taken by a crowdworker to move an alternative (via drag-and-drop operation)and the number of positions by which the alternative was moved. After the initial pruning (as de-scribed above), we retained 9840 polls submitted by 492 workers in the Rank10 experiment, and 10320polls submitted by 344 workers retained in the Rank5 experiment. Table 2 summarizes the aggregatestatistics. Our observations are discussed below. Rank10 Rank5Mean Median Std. Dev. Mean Median Std. Dev.Sorting time 24.41 22.65 9.12 7.75 6.99 3.54Total number of drag-and-drop operations 7.69 8 1.8 2.91 3 1.13Total number of positions moved during drag-and-drop operations 22.59 23 5.59 5.05 5 2.01Number of operations coinciding with selection/insertion sort 5.09 6 2.28 2.21 2 1.06Kendall’s Tau distance between the initial and final rankings 22.55 22 5.6 5.04 5 2.01

Table 2:

Summary of the user statistics recorded in the experiments in Section 5.1.

Sorting Behavior

Our hypothesis regarding the ranking behavior of human crowdworkers was thatthey use (some combination of) natural sorting algorithms such as selection sort or insertion sort(Section 3.1). To test our hypothesis, we examined the fraction of the drag-and-drop operations that coincided with an iteration of selection/insertion sort. (Given a ranking σ , a drag-and-drop operationon σ coincides with selection/insertion sort if the order of alternatives resulting from the drag-and-dropoperation exactly matches the order of alternatives when one iteration of either selection or insertionsort is applied on σ .) We found that, on average, . . = 76% of all drag-and-drop operations in Rank5(and . . = 66 . in the Rank10) coincided with selection/insertion sort. Drag-and-Drop Behavior

To identify the drag-and-drop behavior of the users, we plot the time-to-rank as a function of the total number of positions by which the alternatives are moved in each poll(Figure 4). Recall from Remark 1 that for an ideal user who uses only insertion/selection sort, the latterquantity is equal to d kt ( σ, τ ) . Dataset Avg. MSE √ Avg. MSE Avg. Sorting Time Number of users based on their best-fit model(in seconds ) (in seconds) (in seconds) Only d kt Only d kt and Table 3:

Average 5-fold cross-validation MSE over all workers using the best model for each worker, and the number of usersfor which each of the models was identified to be the best.

Our hypothesis was that the sorting time varies linearly with the total number of drag-and-dropoperations ( d kt ( σ, τ ) ). To verify this, we used linear regressionwith time-to-rank (or sorting time) as the target variable and measured the mean squared error (MSE)using 5-fold cross-validation for three different choices of independent variables: (1) Only d kt , (2) only8 igure 4: Relationship between the number of positions moved and the total sorting time for

Rank10 (left) and

Rank5 (right). d kt and . . = 26 . of theobserved times for Rank10 and within . . = 35 . for Rank5. To evaluate the usefulness of our framework, we compared a random recommendation strategy withone that forms an increasingly accurate estimate of users’ preferences with time. Specifically, we firstfix the ground truth ranking of alternatives consisting of randomly generated integers between and . Each crowdworker then participates in two sets of polls each. In one set of polls, thecrowdworkers are provided with initial rankings generated by adding independent Gaussian noise tothe ground truth (to simulate a random recommendation strategy), and their sorting times are recorded.In the second set of polls, the recommended set of alternatives is the same as under the randomstrategy but ordered order to a Borda ranking. Specifically, the ordering in the k th iteration is deter-mined by the Borda ranking aggregated from the previous k − iterations. Figure 5:

Relationship between sorting time and the number of polls completed by the users for std dev= (left)and std dev= (right). Figure 5 shows the average sorting time of the crowdworkers as a function of the index of the pollsunder two different noise settings: std. dev. = 10 and std. dev. = 20. We can make two importantobservations: First, that Borda recommendation strategy (in green) provides a significant reduction inthe sorting time of the users compared to the random strategy (in blue). Indeed, the sorting time of theusers is reduced by up to , thus validating the practical usefulness of our framework. The second9bservation is that the reduction in sorting time is not due to increasing familiarity with the interface.This is because the average sorting time for the random strategy remains almost constant throughoutthe duration of the poll.

We proposed a recommendation framework to minimize time-to-rank. We formulated a theoreticalmodel of the recommendation problem (including NP-hardness results and associated approximationalgorithms), and illustrated the practical effectiveness of our approach in real-world experiments.Our work opens up a number of directions for future research. In terms of theoretical questions,it would be interesting to analyze the complexity of the recommendation problem for other distancemeasures, e.g., Ulam distance. On the practical side, it would be interesting to analyze the effect ofcognitive biases such as the framing effect (Tversky and Kahneman, 1981) and list position bias (Ler-man and Hogg, 2014) on the recommendation problem. Progress in this direction can, in turn, haveimplications on the fairness of recommendation algorithms.

Acknowledgments

We are grateful to IJCAI-19 reviewers for their helpful comments. This work is supported by NSF

References

Shivani Agarwal. On Ranking and Choice Models. In

Proceedings of the Twenty-Fifth International JointConference on Artificial Intelligence , pages 4050–4053, 2016. (Cited on page 3)Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating Inconsistent Information: Ranking andClustering.

Journal of the ACM , 55(5):23, 2008. (Cited on page 13)William Albert and Thomas Tullis.

Measuring the User Experience: Collecting, Analyzing, and PresentingUsability Metrics . Newnes, 2013. (Cited on page 2)Noga Alon. Ranking Tournaments.

SIAM Journal on Discrete Mathematics , 20(1):137–142, 2006. (Citedon page 13)Pranjal Awasthi, Avrim Blum, Or Sheffet, and Aravindan Vijayaraghavan. Learning Mixtures of Rank-ing Models. In

Proceedings of Advances in Neural Information Processing Systems , pages 2609–2617,2014. (Cited on page 3)Nigel Bevan, James Carter, and Susan Harker. ISO 9241-11 revised: What have we Learnt about Usabil-ity Since 1998? In

International Conference on Human-Computer Interaction , pages 143–151. Springer,2015. (Cited on page 2)Avrim Blum, Jeffrey Jackson, Tuomas Sandholm, and Martin Zinkevich. Preference Elicitation andQuery Learning.

Journal of Machine Learning Research , 5:649–667, 2004. (Cited on page 2)Craig Boutilier. Computational Decision Support: Regret-Based Models for Optimization and Prefer-ence Elicitation. In P. H. Crowley and T. R. Zentall, editors,

Comparative Decision Making: Analysisand Support Across Disciplines and Applications . Oxford University Press, 2013. (Cited on page 2)Mark Braverman and Elchanan Mossel. Noisy Sorting Without Resampling. In

Proceedings of theNineteenth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 268–276, 2008. (Cited onpage 3) 10óbert Busa-Fekete, Eyke Hüllermeier, and Balázs Szörényi. Preference-Based Rank Elicitation us-ing Statistical Models: The Case of Mallows. In

Proceedings of the 31st International Conference onInternational Conference on Machine Learning , pages II:1071–1079, 2014. (Cited on page 2)Irène Charon and Olivier Hudry. An Updated Survey on the Linear Ordering Problem for Weighted orUnweighted Tournaments.

Annals of Operations Research , 175(1):107–158, 2010. (Cited on page 13)Justin Cheng, Jaime Teevan, and Michael S Bernstein. Measuring Crowdsourcing Effort with Error-Time Curves. In

Proceedings of the 33rd Annual ACM Conference on Human Factors in ComputingSystems , pages 1365–1374, 2015. (Cited on page 2)Wolfram Conen and Tuomas Sandholm. Minimal Preference Elicitation in Combinatorial Auctions. In

IJCAI-2001 Workshop on Economic Agents, Models, and Mechanisms , pages 71–80, 2001. (Cited onpage 2)Vincent Conitzer. Computing Slater Rankings Using Similarities among Candidates. In

Proceedingsof the 21st National Conference on Artificial Intelligence , volume 1, pages 613–619, 2006. (Cited onpage 13)Vincent Conitzer and Tuomas Sandholm. Vote Elicitation: Complexity and Strategy-Proofness. In

Eighteenth National Conference on Artificial Intelligence , pages 392–397, 2002. (Cited on page 2)Don Coppersmith, Lisa K Fleischer, and Atri Rurda. Ordering by Weighted Number of Wins Gives aGood Ranking for Weighted Tournaments.

ACM Transactions on Algorithms (TALG) , 6(3):55, 2010.(Cited on page 14)Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank Aggregation Methods for the Web.In

Proceedings of the 10th World Wide Web Conference , pages 613–622, 2001. (Cited on pages 13and 17)John Guiver and Edward Snelson. Bayesian Inference for Plackett-Luce Ranking Models. In

Proceedingsof the 26th Annual International Conference on Machine Learning , ICML-09, pages 377–384, Montreal,Quebec, Canada, 2009. (Cited on page 3)Maurice G Kendall. A New Measure of Rank Correlation.

Biometrika , 30(1/2):81–93, 1938. (Cited onpage 4)Claire Kenyon-Mathieu and Warren Schudy. How to Rank with Few Errors: A PTAS for WeightedFeedback Arc Set on Tournaments. In

Proceedings of the Thirty-Ninth Annual ACM Symposium onTheory of Computing , pages 95–103, 2007. (Cited on page 13)Ashish Khetan and Sewoong Oh. Data-Driven Rank Breaking for Efficient Rank Aggregation.

Journalof Machine Learning Research , 17(193):1–54, 2016. (Cited on page 3)Sharon J Laskowski, Marguerite Autry, John Cugini, and William Killam. Improving the Usability andAccessibility of Voting Systems and Products.

NIST Special Publication , 500:256, 2004. (Cited onpage 2)Kristina Lerman and Tad Hogg. Leveraging Position Bias to Improve Peer Recommendation.

PloS one ,9(6):e98914, 2014. (Cited on page 10)Tie-Yan Liu.

Learning to Rank for Information Retrieval . Springer, 2011. (Cited on page 3)Tyler Lu and Craig Boutilier. Effective Sampling and Learning for Mallows Models with Pairwise-Preference Data.

Journal of Machine Learning Research , 15:3963–4009, 2014. (Cited on page 3)Robert Duncan Luce.

Individual Choice Behavior: A Theoretical Analysis . Wiley, 1959. (Cited on page 4)11ddy Maddalena, Marco Basaldella, Dario De Nart, Dante Degl’Innocenti, Stefano Mizzaro, and Gian-luca Demartini. Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting theTime to Judge. In

Fourth AAAI Conference on Human Computation and Crowdsourcing , 2016. (Citedon page 2)Colin L. Mallows. Non-Null Ranking Model.

Biometrika , 44(1/2):114–130, 1957. (Cited on pages 4and 13)Sahand Negahban, Sewoong Oh, and Devavrat Shah. Rank Centrality: Ranking from Pairwise Com-parisons.

Operations Research , 65(1):266–287, 2017. (Cited on page 3)Robin L. Plackett. The Analysis of Permutations.

Journal of the Royal Statistical Society. Series C (AppliedStatistics) , 24(2):193–202, 1975. (Cited on page 4)Ariel D Procaccia and Nisarg Shah. Optimal Aggregation of Uncertain Preferences. In

Thirtieth AAAIConference on Artificial Intelligence , pages 608–614, 2016. (Cited on page 3)Hossein Azari Soufiani, David C Parkes, and Lirong Xia. Preference Elicitation For General RandomUtility Models. In

Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence ,pages 596–605, 2013. (Cited on page 2)Amos Tversky and Daniel Kahneman. The Framing of Decisions and the Psychology of Choice.

Science ,211(4481):453–458, 1981. (Cited on page 10)Zhibing Zhao and Lirong Xia. Composite Marginal Likelihood Methods for Random Utility Models.In

Proceedings of the 35th International Conference on Machine Learning , volume 80 of

Proceedings ofMachine Learning Research , pages 5922–5931, 2018. (Cited on page 3)Zhibing Zhao, Peter Piech, and Lirong Xia. Learning Mixtures of Plackett-Luce Models. In

Proceedingsof the 33rd International Conference on Machine Learning (ICML-16) , 2016. (Cited on page 3)Zhibing Zhao, Haoming Li, Junming Wang, Jeffrey Kephart, Nicholas Mattei, Hui Su, and Lirong Xia.A Cost-Effective Framework for Preference Elicitation and Aggregation. In

Proceedings of the 2018Conference on Uncertainty in Artificial Intelligence (UAI) , 2018. (Cited on page 2)12

Appendix

The pairwise marginal distribution for the k -mixture Plackett-Luce model is given by Pr σ ∼ k -PL ( a i (cid:31) σ a j ) = (cid:80) k(cid:96) =1 γ ( (cid:96) ) · θ ( (cid:96) ) i θ ( (cid:96) ) i + θ ( (cid:96) ) j . (1) Proposition 1 (Mallows, 1957) . Let σ ∗ , φ be the parameters of a Mallows model ( MM ), and let a i , a j ∈ A be such that a i (cid:31) σ ∗ a j . Let ∆ = rank ( σ ∗ , a j ) − rank ( σ ∗ , a i ) . Then, Pr σ ∼ ( σ ∗ ,φ ) ( a i (cid:31) σ a j ) = (cid:80) ∆ z =1 zφ z − ( (cid:80) ∆ − z =0 φ z )( (cid:80) ∆ z =0 φ z ) . The pairwise marginal for a k -mixture Mallows model, parameterized by { γ ( (cid:96) ) , σ ∗ ( (cid:96) ) , φ ( (cid:96) ) } k(cid:96) =1 , canbe derived similarly. Fix a pair a i , a j ∈ A . For each (cid:96) ∈ [ k ] , let ∆ i,j(cid:96) := rank ( σ ∗ ( (cid:96) ) , a j ) − rank ( σ ∗ ( (cid:96) ) , a i ) .Define the function g (cid:96) : Z \ { } → R ≥ as g (cid:96) (∆) :=  (cid:80) ∆ z =1 zφ z − (cid:96) ) (cid:16)(cid:80) ∆ − z =0 φ z ( (cid:96) ) (cid:17)(cid:16)(cid:80) ∆ z =0 φ z ( (cid:96) ) (cid:17) if ∆ > , − (cid:80) | ∆ | z =1 zφ z − (cid:96) ) (cid:16)(cid:80) | ∆ |− z =0 φ z ( (cid:96) ) (cid:17)(cid:16)(cid:80) | ∆ | z =0 φ z ( (cid:96) ) (cid:17) if ∆ < . Thus, g (cid:96) (∆ i,j(cid:96) ) is the pairwise marginal probability induced by the (cid:96) th mixture, i.e., g (cid:96) (∆ i,j(cid:96) ) = Pr σ ∼ ( σ ∗ ( (cid:96) ) ,φ ( (cid:96) ) ) ( a i (cid:31) σ a j ) . The pairwise marginal for the k -MM model is given by Pr σ ∼ k -MM ( a i (cid:31) σ a j ) = (cid:80) k(cid:96) =1 γ ( (cid:96) ) g (cid:96) (∆ i,j(cid:96) ) . (2) Definition 7 (Kemeny) . Given a preference profile { σ ( i ) } ni =1 and a number δ ∈ Q , does there exist σ ∈ L ( A ) such that (cid:80) ni =1 d kt ( σ, σ ( i ) ) ≤ δ ? Kemeny is known to be NP-complete even for n = 4 (Dwork et al., 2001). Definition 8 (Weighted Feedback Arc Set in Tournaments (WFAST)) . Given a complete directedgraph G = ( V, E ) , a set of non-negative edge weights { w i,j , w j,i } ( i,j ) ∈ E where w i,j + w j,i = b for somefixed constant b ∈ (0 , , and a number δ ∈ Q , does there exist σ ∈ L ( A ) such that (cid:80) i,j ∈ V w j,i · [ a i (cid:31) σ a j ] ≤ δ ? WFAST is known to be NP-complete even when w i,j = 1 if ( i, j ) ∈ E and otherwise (Ailonet al., 2008; Alon, 2006; Conitzer, 2006; Charon and Hudry, 2010). A polynomial-time approximationscheme (PTAS) for WFAST is also known (Kenyon-Mathieu and Schudy, 2007). Proposition 2 recallsthis result. Proposition 2 (Kenyon-Mathieu and Schudy, 2007) . Given any ε > and an instance of WFAST, thereexists an algorithm that runs in time | V | ˜ O (1 /ε ) and returns a linear order σ such that (cid:88) i,j ∈ V w j,i · [ a i (cid:31) σ a j ] ≤ (1 + ε ) (cid:88) i,j ∈ V w j,i · [ a i (cid:31) σ ∗ a j ] , where σ ∗ ∈ arg min τ ∈L ( A ) (cid:80) i,j ∈ V w j,i · [ a i (cid:31) τ a j ] . b = 1 , WFAST admits a -approximation algorithm based on the Borda count voting rule(i.e., ordering the vertices in increasing order of their weighted indegrees). Proposition 3 (Coppersmith et al., 2010) . There is a polynomial-time algorithm that, given any instanceof WFAST with b = 1 , returns a linear order σ such that (cid:80) i,j ∈ V w j,i · [ a i (cid:31) σ a j ] ≤ · (cid:80) i,j ∈ V w j,i · [ a i (cid:31) σ ∗ a j ] , where σ ∗ ∈ arg min τ ∈L ( A ) (cid:80) i,j ∈ V w j,i · [ a i (cid:31) τ a j ] . Lemma 1.

For any two sorting algorithms A , A (cid:48) ∈ A , any σ, τ ∈ L ( A ) , and any (cid:96) ∈ [ m − , f σ → τ A ( (cid:96) ) = f σ → τ A (cid:48) ( (cid:96) ) .Proof. We will prove Lemma 1 via induction on the number of alternatives m . The base case of m = 1 is trivial. Suppose the lemma holds for all alternative sets of size m ≤ n − . We will show that thelemma also holds for m = n .Let σ, τ be any two linear orders over the same set of n alternatives, namely A . Let a := τ (1) bethe most preferred alternative under τ , and let a be ranked k th under σ , i.e., σ ( k ) = a . Let σ − a and τ − a denote the truncated linear orders obtained by dropping the alternative a from σ and τ respectively.We will show that for any sorting algorithm A ∈ A , the following conditions hold:If k = n, then f σ → τ A ( (cid:96) ) = (cid:40) f σ − a → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ n − , and for (cid:96) = n − (3)and if k < n, then f σ → τ A ( (cid:96) ) =  f σ − a → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ n − \ { k − } ,f σ − a → τ − a A ( (cid:96) ) + 1 for (cid:96) = k − , and for (cid:96) = n − . (4)Note that the claims in Equations (3) and (4) suffice to prove the lemma: Indeed, σ − a and τ − a arevalid linear orders over the same set of ( n − alternatives, namely A \{ a } . Therefore, by the inductionhypothesis, we have that for any two sorting algorithms A , A (cid:48) ∈ A and any (cid:96) ∈ [ n − , f σ − a → τ − a A ( (cid:96) ) = f σ − a → τ − a A (cid:48) ( (cid:96) ) . (5)Equations (3) to (5) together give us that f σ → τ A ( (cid:96) ) = f σ → τ A (cid:48) ( (cid:96) ) for all (cid:96) ∈ [ n − , as desired.To prove the claims in Equations (3) and (4), recall from Section 3.1 that a sorting algorithm A is asequence of steps s , s , . . . such that every step corresponds to either selection or insertion sort, i.e., s j = { SEL , INS } for every j . We will prove the claims via case analysis based on whether A performsa selection sort operation during the first k steps or not. Case I : At least one of the first k steps s , . . . , s k is selection sort .Let ≤ i ≤ k be such that s i = SEL and s j = INS for all ≤ j < i . In the first ( i − steps(which are all insertion sort operations), the algorithm A only considers the top ( i − alternatives in σ , namely P i − ( σ ) . Furthermore, since i − < k , we have that a / ∈ P i − ( σ ) . Therefore, the top ( i − alternatives in σ are identical to those in σ − a , and the execution of A during σ → τ is identical tothat during σ − a → τ − a for the first ( i − steps. Stated differently, if f σ,i A ( (cid:96) ) and f σ − a ,i A ( (cid:96) ) denote thenumber of move-up-by- (cid:96) -positions operations performed by A during the first i steps for the input σ and σ − a respectively, then f σ,i − A ( (cid:96) ) = f σ − a ,i − A ( (cid:96) ) for all (cid:96) ∈ [ n − and f σ,i − A ( n −

1) = 0 .14t the i th step, A performs a selection sort operation. This involves promoting the alternative a by ( k − positions to the top of the current list. Therefore, at the end of the first i steps, we have:If k = n, then f σ,i A ( (cid:96) ) = (cid:40) f σ − a ,i − A ( (cid:96) ) for all (cid:96) ∈ [ n − , and for (cid:96) = n − (6)and if k < n, then f σ,i A ( (cid:96) ) =  f σ − a ,i − A ( (cid:96) ) for all (cid:96) ∈ [ n − \ { k − } ,f σ − a ,i − A ( (cid:96) ) + 1 for (cid:96) = k − , and for (cid:96) = n − . (7)Let σ (cid:48) denote the list maintained by A at the end of the i th step during σ → τ . In addition, let σ (cid:48)(cid:48) denote the list maintained by A at the end of the ( i − th step during σ − a → τ − a . We therefore havethat f σ → τ A ( (cid:96) ) = f σ,i A ( (cid:96) ) + f σ (cid:48) → τ A ( (cid:96) ) for every (cid:96) ∈ [ n − , and f σ − a → τ − a A ( (cid:96) ) = f σ − a ,i − A ( (cid:96) ) + f σ (cid:48)(cid:48) → τ − a A ( (cid:96) ) for every (cid:96) ∈ [ n − . (8)Observe that σ (cid:48) = ( a, σ (cid:48)(cid:48) ) . Consider the execution of A during σ (cid:48) → τ and during σ (cid:48)(cid:48) → τ − a .From Lemma 2 (stated below), we have that f σ (cid:48) → τ A ( (cid:96) ) = f σ (cid:48)(cid:48) → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ n − and f σ (cid:48) → τ A ( n −

1) = 0 . (9)Equations (6) to (9) together give the desired claim. Case II : Each of the first k steps is insertion sort, i.e., s = INS , . . . , s k = INS.The analysis in this case is identical to that of Case I for the first ( k − steps. That is, at the endof the first ( k − steps, f σ → τ A ( (cid:96) ) = f σ − a → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ n − and f σ → τ A ( n −

1) = 0 . Note thatalternative a continues to be at the k th position in the current list at the end of the first ( k − steps.At the k th step, A performs an insertion sort operation. Since a is the most preferred alternativeunder τ , this step once again involves promoting a by ( k − positions to the top of the current list,i.e., the count function is modified exactly as in Case I. The rest of the analysis is identical to Case I aswell. This finishes the proof of Lemma 1. Lemma 2.

Let a ∈ A and σ − a , τ − a ∈ L ( A \ { a } ) . Let σ, τ ∈ L ( A ) be such that σ := ( a, σ − a ) and τ := ( a, τ − a ) . Then, for any sorting algorithm A ∈ A , f σ → τ A ( (cid:96) ) = f σ − a → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ m − and f σ → τ A ( m −

1) = 0 , where | A | = m .Proof. We will first argue that f σ → τ A ( m −

1) = 0 . Suppose, for contradiction, that f σ → τ A ( m − > ,that is, some alternative (say, b ) is promoted by ( m − positions during the execution of A . Sinceboth selection and insertion sort maintain the sorted prefix property at every time step, it must be that b (cid:31) τ a , which is a contradiction since a is the most preferred alternative under τ .Next, we will argue that f σ → τ A ( (cid:96) ) = f σ − a → τ − a A ( (cid:96) ) for all (cid:96) ∈ [ m − . Once again, by the sortedprefix property, no alternative is promoted above a at any time step during σ → τ . Since the topposition remains fixed, the execution of A during σ − a → τ − a can be mimicked to obtain the executionof A during σ → τ . The lemma now follows. 15 .4 Proof of Theorem 1 Theorem 1.

For any σ, τ ∈ L ( A ) , time w lin ( σ, τ ) = d kt ( σ, τ ) and time w aff ( σ, τ ) = c · d kt ( σ, τ ) + d · For the linear weight function w lin , we have time w lin ( σ, τ ) = (cid:80) m − (cid:96) =1 f σ → τ ( (cid:96) ) · (cid:96) . Regardless ofthe choice of the sorting algorithm, any fixed pair of alternatives is swapped at most once during thetransformation from σ to τ . As a result, each “move up by (cid:96) slots” operation, which contributes (cid:96) unitsto the time function, also contributes (cid:96) units to the Kendall’s Tau distance, giving us the desired claim.For the affine weight function w aff , we therefore have time w aff ( σ, τ ) = c · time w lin ( σ, τ ) + d · (cid:80) m − (cid:96) =1 f σ → τ ( (cid:96) ) . = c · d kt ( σ, τ ) + d · moves , as desired. Theorem 2 ( Exact Algorithms ) . ( D , w ) -Recommendation is solvable in polynomial time when w is linear and D is either (a) k -mixture Plackett-Luce ( k -PL ) with k = 1 , (b) k -mixture Mallows model ( k -MM ) with k = 1 , or (c) a uniform distribution with support size n ≤ .Proof. (a) When D is a k -mixture Plackett-Luce ( k -PL ) with k = 1 The expected time for any σ ∈ L ( A ) under the PL model with the parameter θ is given by E τ ∼ θ [ time w lin ( σ, τ )] = E τ ∼ θ [d kt ( σ, τ )] (Theorem 1) = E τ ∼ θ (cid:104)(cid:80) a i ,a j ∈ A : a i (cid:31) σ a j [ a j (cid:31) τ a i ] (cid:105) (Definition 1) = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j E τ ∼ θ [ [ a j (cid:31) τ a i ]] (Linearity of Expectation) = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j Pr τ ∼ θ ( a j (cid:31) τ a i )= (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j θ j θ i + θ j (Definition 2) . (10)Let σ ∗ ∈ L ( A ) be a linear order that is consistent with the parameter θ . That is, for any a i , a j ∈ A , a i (cid:31) σ ∗ a j if and only if either θ i > θ j or i < j in case θ i = θ j . We will show via an exchange argumentthat for any σ ∈ L ( A ) , E τ ∼ θ [ time w lin ( σ ∗ , τ )] ≤ E τ ∼ θ [ time w lin ( σ, τ )] . The desired implication willthen follow by simply computing σ ∗ , which can be done in polynomial time.Consider a pair of alternatives a i , a j ∈ A that are adjacent in σ such that a i (cid:31) σ ∗ a j and a j (cid:31) σ a i (such a pair must exist as long as σ (cid:54) = σ ∗ ). Let σ (cid:48) ∈ L ( A ) be derived from σ by swapping a i and a j (and making no other changes). Then, from Equation (10), we have that E τ ∼ θ [ time w lin ( σ (cid:48) , τ )] − E τ ∼ θ [ time w lin ( σ, τ )] = θ j θ i + θ j − θ i θ i + θ j ≤ , where the inequality holds because σ ∗ is consistent with θ and a i (cid:31) σ ∗ a j . By repeated use of the aboveargument—with σ (cid:48) taking the role of σ , and so on—we get the desired claim.(b) When D is a k -mixture Mallows model ( k -MM ) with k = 1 The proof is similar to case (a). Once again, we let σ and σ (cid:48) be two linear orders that are identicalexcept for the pair a i , a j ∈ A that are adjacent in σ such that a i (cid:31) σ ∗ a j , a j (cid:31) σ a i , and a i (cid:31) σ (cid:48) a j ; here Note that a i , a j need not be adjacent in σ ∗ . ∗ is the reference ranking for the Mallows model. Then, E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ (cid:48) , τ )] − E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ, τ )]= E τ ∼ ( σ ∗ ,φ ) [d kt ( σ (cid:48) , τ )] − E τ ∼ ( σ ∗ ,φ ) [d kt ( σ, τ )] (by Theorem 1) = Pr τ ∼ ( σ ∗ ,φ ) ( a j (cid:31) τ a i ) − Pr τ ∼ ( σ ∗ ,φ ) ( a i (cid:31) τ a j )= 2 (cid:0) − Pr τ ∼ ( σ ∗ ,φ ) ( a i (cid:31) τ a j ) (cid:1) = 2 (cid:18) − (cid:80) ∆ z =1 zφ z − ( (cid:80) ∆ − z =0 φ z )( (cid:80) ∆ z =0 φ z ) (cid:19) (by Proposition 1) , where ∆ = rank ( σ ∗ , a j ) − rank ( σ ∗ , a i ) . It is easy to verify that g (∆) := (cid:80) ∆ z =1 zφ z − ( (cid:80) ∆ − z =0 φ z )( (cid:80) ∆ z =0 φ z ) ≥ forall integral ∆ ≥ whenever φ ∈ [0 , . This implies that E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ (cid:48) , τ )] ≤ E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ, τ )] . Repeated application of the above argument shows that for any linear order σ ∈ L ( A ) , E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ ∗ , τ )] ≤ E τ ∼ ( σ ∗ ,φ ) [ time w lin ( σ, τ )] . The desired implication follows by simply returning the reference ranking σ ∗ as the output.(c) When D is a uniform distribution with support size n ≤ Let D be a uniform distribution over the set of n linear orders { σ ( i ) } ni =1 . From Theorem 1, weknow that for any σ ∈ L ( A ) , we have E τ ∼D [ time w lin ( σ, τ )] = (cid:80) ni =1 d kt ( σ, σ ( i ) ) . When n = 1 , it isclear that σ = σ (1) is the unique minimizer of the expected cost. When n = 2 , it can be argued that σ ∈ { σ (1) , σ (2) } is the desired solution. Indeed, let S := { ( a i , a j ) ∈ A × A : a i (cid:31) σ (1) a j and a j (cid:31) σ (2) a i } be the set of (ordered) pairs of alternatives over which σ (1) and σ (2) disagree. Any linear order σ / ∈ { σ (1) , σ (2) } contributes at least | S | to the expected time in addition to the number of pairs overwhich σ differs from σ (1) or σ (2) . Hence, the expected time is minimized when σ ∈ { σ (1) , σ (2) } . Theorem 3 ( Hardness results ) . ( D , w ) -Recommendation is NP-complete even when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k = 4 , (b) k -mixture Mallows model ( k -MM ) for k = 4 , or (c) a uniform distribution over n = 4 linear orders.Proof. (a) When D is k -mixture Plackett-Luce model ( k -PL ) for k = 4 Let D be a k -mixture Plackett-Luce model with the parameters { γ ( (cid:96) ) , θ ( (cid:96) ) } k(cid:96) =1 , and let σ ∈ L ( A ) .By an argument similar to that in the proof of Theorem 2, we have that E τ ∼ k -PL [ time w lin ( σ, τ )] = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j (cid:80) k(cid:96) =1 γ ( (cid:96) ) · θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j , (11)which can be computed in polynomial time. Hence the problem is in NP.To prove NP-hardness, we will show a reduction from a restricted version of Kemeny for fouragents, which is known to be NP-complete (Dwork et al., 2001). Given any instance of Kemeny withthe preference profile { σ ( (cid:96) ) } n(cid:96) =1 where n = 4 , the parameters of D are set up as follows: The number ofmixtures is set to k = n = 4 . For each (cid:96) ∈ [ n ] , γ ( (cid:96) ) = n , and for each a i ∈ A , θ ( (cid:96) ) i = m m − rank ( σ ( (cid:96) ) ,a i )) .Thus, for instance, if σ (1) = ( a (cid:31) a (cid:31) . . . a m ) , then θ (1)1 = m m − , θ (1)2 = m m − , . . . , θ (1) m = 1 .Notice that despite being exponential in m , the parameters { θ ( (cid:96) ) i } (cid:96) ∈ [ n ] ,i ∈ [ m ] can each be specified inpoly ( m ) number of bits, and are therefore polynomial in the input size.17e will now argue that a linear order σ ∈ L ( A ) satisfies (cid:80) ni =1 d kt ( σ, σ ( i ) ) ≤ δ if and only if E τ ∼ k -PL [ time w lin ( σ, τ )] ≤ δ + 0 . . First, suppose that σ satisfies (cid:80) ni =1 d kt ( σ, σ ( i ) ) ≤ θ . Define, foreach (cid:96) ∈ [ n ] , S (cid:96) := { ( a i , a j ) ∈ A × A : a i (cid:31) σ a j and a j (cid:31) σ ( (cid:96) ) a i } . Thus, (cid:80) n(cid:96) =1 | S (cid:96) | ≤ δ . Then, fromEquation (11), we have that E τ ∼ k -PL [ time w lin ( σ, τ )] = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j (cid:80) n(cid:96) =1 1 n · θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j = n (cid:80) n(cid:96) =1 (cid:18)(cid:80) ( a i ,a j ) ∈ S (cid:96) θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j + (cid:80) ( a i ,a j ) / ∈ S (cid:96) θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j (cid:19) ≤ n (cid:80) n(cid:96) =1 (cid:16)(cid:80) ( a i ,a j ) ∈ S (cid:96) (cid:80) ( a i ,a j ) / ∈ S (cid:96) m (cid:17) ≤ n (cid:80) n(cid:96) =1 (cid:16) δ + (cid:0) m (cid:1) m (cid:17) = δ + (cid:0) m (cid:1) m ≤ δ + 0 . . The first inequality follows from the choice of parameters in our construction. Indeed, for any ( a i , a j ) ∈ S (cid:96) , we have θ ( (cid:96) ) i , θ ( (cid:96) ) j ≥ and therefore θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j ≤ . In addition, for any ( a i , a j ) / ∈ S (cid:96) , we have thatrank ( σ ( (cid:96) ) , a i ) < rank ( σ ( (cid:96) ) , a j ) , and therefore θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j = m ( rank ( σ ( (cid:96) ) ,aj ) − rank ( σ ( (cid:96) ) ,ai ) ) ≤ m . The second inequality uses the fact that (cid:80) n(cid:96) =1 | S (cid:96) | ≤ δ . The final inequality holds because m +1 − (cid:0) m (cid:1) > for all m ≥ .Now suppose that σ satisfies E τ ∼ k -PL [ time w lin ( σ, τ )] ≤ δ +0 . . We will argue that (cid:80) ni =1 d kt ( σ, σ ( i ) ) must be strictly smaller than δ + 1 , which, by integrality, will give us the desired claim. Suppose, forcontradiction, that (cid:80) ni =1 d kt ( σ, σ ( i ) ) ≥ δ + 1 . Thus, (cid:80) n(cid:96) =1 | S (cid:96) | ≥ δ + 1 . We can use this relation toconstruct a lower bound on the expected time, as follows: E τ ∼ k -PL [ time w lin ( σ, τ )] = n (cid:80) n(cid:96) =1 (cid:18)(cid:80) ( a i ,a j ) ∈ S (cid:96) θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j + (cid:80) ( a i ,a j ) / ∈ S (cid:96) θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j (cid:19) ≥ ( δ + 1) m m +1 > δ + 0 . . The first inequality holds because for any ( a i , a j ) / ∈ S (cid:96) , we have that θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j ≥ , and for any ( a i , a j ) ∈ S (cid:96) , we have that θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j ≥ m rank ( σ ( (cid:96) ) ,ai ) − rank ( σ ( (cid:96) ) ,aj )) m rank ( σ ( (cid:96) ) ,ai ) − rank ( σ ( (cid:96) ) ,aj )) ≥ m m . The second inequality holds because δ +11+ m < for all m ≥ (note that we can assume m ≥ without loss of generality). The chain of inequalities give us the desired contradiction. Hence, it mustbe that (cid:80) ni =1 d kt ( σ, σ ( i ) ) ≤ δ . This finishes the proof of part (a) of Theorem 3.(b) When D is k -mixture Mallows model ( k -MM ) for k = 4 Let D be a k -mixture Mallows model with the parameters { γ ( (cid:96) ) , σ ∗ ( (cid:96) ) , φ ( (cid:96) ) } k(cid:96) =1 , and let σ ∈ L ( A ) .By an argument similar to that in the proof of Theorem 2, we have that E τ ∼ k -MM [ time w lin ( σ, τ )] = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j (cid:80) k(cid:96) =1 γ ( (cid:96) ) · Pr τ ∼ ( σ ∗ ( (cid:96) ) ,φ ( (cid:96) ) ) ( a j (cid:31) τ a i ) , { σ ( (cid:96) ) } n(cid:96) =1 , the parameters of D are set up as follows: The number of mix-tures k is set to n . For each (cid:96) ∈ [ n ] , γ ( (cid:96) ) = n , σ ∗ ( (cid:96) ) = σ ( (cid:96) ) , and φ ( (cid:96) ) = 0 . The expected time for anylinear order σ is simply its average Kendall’s Tau distance from the profile { σ ( (cid:96) ) } n(cid:96) =1 , hence the equiv-alence of the solutions follows. Finally, since Kemeny is known to be NP-complete even for n = 4 , asimilar implication holds for ( D , w ) -Recommendation when k = 4 .(c) When D is a uniform distribution over n = 4 linear orders Membership in NP follows from Theorem 1, since for the linear weight function, the expectedtime for any linear order σ ∈ L ( A ) is equal to its average Kendall’s Tau distance from the preferenceprofile that supports D , which can be computed in polynomial time. In addition, NP-hardness followsfrom a straightforward reduction from Kemeny: Given any instance of Kemeny with the preferenceprofile { σ ( i ) } ni =1 , the distribution D in ( D , w ) -Recommendation is simply a uniform distribution over { σ ( i ) } ni =1 . The equivalence of the solutions follows once again from Theorem 1. Finally, since Kemenyis known to be NP-complete even for n = 4 , a similar implication holds for ( D , w ) -Recommendationas well. Theorem 4 ( PTAS ) . ( D , w ) -Recommendation admits a polynomial time approximation scheme ( PTAS ) when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k ∈ N , (b) k -mixture Mal-lows model ( k -MM ) for k ∈ N , or (c) a uniform distribution ( Unif ) .Proof. We will show that for each of the three settings in Theorem 4, ( D , w ) -Recommendation turnsout to be a special case of WFAST, and therefore the PTAS of Proposition 2 from Section 7.2 applies.(a) When D is k -mixture Plackett-Luce model ( k -PL ) Recall from Theorem 1 that when the weight function is linear, the expected cost of σ ∈ L ( A ) isgiven by E τ ∼D [ time w lin ( σ, τ )] = E τ ∼D [d kt ( σ, τ )] . When D is a k -mixture Plackett-Luce model ( k -PL)with the parameters { γ ( (cid:96) ) , θ ( (cid:96) ) } k(cid:96) =1 , the expected cost of σ under D is given by (refer to Equation (10)in the proof of Theorem 2): E τ ∼ k -PL [ time w lin ( σ, τ )] = (cid:80) a i ,a j ∈ A : a i (cid:31) σ a j (cid:80) k(cid:96) =1 γ ( (cid:96) ) · θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j . Consider a complete, directed, and weighted graph G = ( A, E ) defined over the set of alternatives,where for every pair of alternatives a i , a j , we have ( a i , a j ) ∈ E if and only if either θ i > θ j or i < j in case θ i = θ j . Each edge ( a i , a j ) ∈ E is associated with a pair of weights w i,j = (cid:80) k(cid:96) =1 γ ( (cid:96) ) · θ ( (cid:96) ) i θ ( (cid:96) ) i + θ ( (cid:96) ) j and w j,i = (cid:80) k(cid:96) =1 γ ( (cid:96) ) · θ ( (cid:96) ) j θ ( (cid:96) ) i + θ ( (cid:96) ) j . Notice that w i,j + w j,i = 1 for every ( a i , a j ) ∈ E . Furthermore, theexpected cost of σ can be expressed in terms of the edge-weights as follows: E τ ∼ k -PL [ time w lin ( σ, τ )] = (cid:88) a i ,a j ∈ A w j,i · [ a i (cid:31) σ a j ] . Therefore, σ is a solution of ( D , w ) -Recommendation if and only if it is a solution of WFAST for thegraph G constructed above (with b = 1 ).(b) When D is k -mixture Mallows model ( k -MM )

19n analogous argument works for the case when D is a k -mixture Mallows model ( k -MM) with theparameters { γ ( (cid:96) ) , σ ∗ ( (cid:96) ) , φ ( (cid:96) ) } k(cid:96) =1 . In this case, we set the weights to be w i,j = (cid:80) k(cid:96) =1 γ ( (cid:96) ) · g (cid:96) (∆ i,j(cid:96) ) and w j,i = (cid:80) k(cid:96) =1 γ ( (cid:96) ) · (cid:16) − g (cid:96) (∆ i,j(cid:96) ) (cid:17) , where ∆ i,j(cid:96) and g (cid:96) ( · ) are as defined in Equation (2) in Section 7.1.(c) When D is a uniform distribution Finally, when D is a uniform distribution over { σ ( (cid:96) ) } n(cid:96) =1 , an analogous argument works for w i,j = (cid:80) n(cid:96) =1 1 n · [ a i (cid:31) σ ( (cid:96) ) a j ] and w j,i = (cid:80) n(cid:96) =1 1 n · [ a j (cid:31) σ ( (cid:96) ) a i ] . Theorem 5 ( -approximation ) . ( D , w ) -Recommendation admits a polynomial time -approximationalgorithm when w is linear and D is either (a) k -mixture Plackett-Luce model ( k -PL ) for k ∈ N , (b) k -mixture Mallows model ( k -MM ) for k ∈ N , or (c) a uniform distribution ( Unif ) .Proof. (Sketch) The proof is similar to that of Theorem 4 in Section 7.7. The only difference is thatwe use the algorithm in Proposition 3 (from Section 7.2) instead of Proposition 2 as a subroutine.Notice that the condition w i,j + w j,i = 1 is satisfied for every ( a i , a j ) ∈ E , and thus Proposition 3 isapplicable. Theorem 6 (Approximation for general weights) . Given any ε > and any weight function w that is ( α, β ) -close to the linear weight function w lin , there exists an algorithm that runs in time m ˜ O (1 /ε ) andreturns a linear order σ such that E τ ∼D [ time w ( σ, τ )] ≤ αβ (1 + ε ) E τ ∼D [ time w ( σ ∗ , τ )] , where σ ∗ ∈ arg min σ (cid:48) ∈L ( A ) E τ ∼D [ time w ( σ (cid:48) , τ )] .Proof. We will show that the linear order σ constructed in Theorem 4 provides the desired approxima-tion guarantee. Let σ lin ∈ arg min σ ∈L ( A ) E τ ∼D [ time w lin ( σ, τ )] . Then, E τ ∼D [ time w ( σ, τ )] ≤ α E τ ∼D [ time w lin ( σ, τ )] (by closeness-of-weights) ≤ α (1 + ε ) E τ ∼D [ time w lin ( σ lin , τ )] (Theorem 4) ≤ α (1 + ε ) E τ ∼D [ time w lin ( σ ∗ , τ )] (optimality of σ lin ) ≤ αβ (1 + ε ) E τ ∼D [ time w ( σ ∗ , τ )] (by closeness-of-weights) ..