[PDF] Predicting Strategic Voting Behavior with Poll Information

Abstract

The question of how people vote strategically under uncertainty has attracted much attention in several disciplines. Theoretical decision models have been proposed which vary in their assumptions on the sophistication of the voters and on the information made available to them about others' preferences and their voting behavior. This work focuses on modeling strategic voting behavior under poll information. It proposes a new heuristic for voting behavior that weighs the success of each candidate according to the poll score with the utility of the candidate given the voters' preferences. The model weights can be tuned individually for each voter. We compared this model with other relevant voting models from the literature on data obtained from a recently released large scale study. We show that the new heuristic outperforms all other tested models. The prediction errors of the model can be partly explained due to inconsistent voters that vote for (weakly) dominated candidates.

Full PDF

PPredicting Strategic Voting Behavior withPoll Information

Roy Fairstein, Adam Lauz, Kobi Gal, Reshef Meir

Abstract

The question of how people vote strategically under uncertainty has attracted muchattention in several disciplines. Theoretical decision models have been proposedwhich vary in their assumptions on the sophistication of the voters and on the infor-mation made available to them about others’ preferences and their voting behavior.This work focuses on modeling strategic voting behavior under poll information. Itproposes a new heuristic for voting behavior that weighs the success of each candidateaccording to the poll score with the utility of the candidate given the voters’ prefer-ences. The model weights can be tuned individually for each voter. We comparedthis model with other relevant voting models from the literature on data obtainedfrom a recently released large scale study. We show that the new heuristic outper-forms all other tested models. The prediction errors of the model can be partlyexplained due to inconsistent voters that vote for (weakly) dominated candidates.

It is well accepted that people often vote strategically in political and other situations,taking into account not just their preferences, but also beliefs about how their vote wouldaﬀect the outcome [7, 2]. Researchers in economics, political science, and more recentlyin the computational social choice community, suggested various mathematical models tocapture the strategic decision that a voter faces [6, 5, 8]. The gains from a certain actiondepend not only on the preferences of the voter, but also on the votes of others. Thus, partof the diﬃculty in predicting a voter’s decision arises from the fact that there is uncertainty about others’ voting decisions, i.e., what can be inferred from the poll about the actualvotes. Theoretical models describe this uncertainty in the following diﬀerent ways, whichcan lead to diﬀerent predictions of a voter’s actions. • Expected utility maximization. A rational voter maximizes her expected utility withrespect to a probability distribution over the actions of the other voters. The dis-tribution itself may be given exogenously (e.g., by a poll with known variance as inour model), or derived via equilibrium analysis from the uncertain preferences of theother voters. Such models were developed mainly in the economics literature and aresometimes known as the “calculus of voting” [15, 11, 12]. • Voting heuristics. In these models the voter uses some (typically simple) functionthat states which action to use at any given situation. The voter is not assumed to berational, and may not even have a cardinal utility measure or an explicit probabilisticrepresentation of the diﬀerent outcomes. For example, according to the 2-pragmatistheuristics, the voter behaves as if only the two candidates leading the poll are partic-ipating [14]. • Bounded rationality. These models present a mid-point between utility maximizationand heuristics. The voter makes a rational strategic decision based on a heuristicbelief, rather than accurate probabilistic belief. One example of such a model is localdominance [10], which assumes that each voter derives a set of possible outcomes basedon a poll, and then selects a non-dominated action within these outcomes. a r X i v : . [ c s . G T ] M a y e evaluated the diﬀerent models on data obtained from Tal et al. [16] who implementedseveral voting scenarios in controlled experiments involving humans in diﬀerent scenariosthat vary the number of voters, the poll information, and voters’ preferences. Our main ﬁnd-ings are that the AU model outperforms all other models in all scenarios. The k -pragmatistheuristic model which considers only a limited number of candidates when making decisionscomes in second. The bounded-rational models obtained the worst performance. Our ﬁrst contribution is an extensive evaluation of various decision models on real-worlddata. We use the data of Tal et al. [16], where human subjects with dictated preferences areexposed to a poll over three candidates and make a single voting decision under the Pluralityrule. This is the simplest possible setting that involves a nontrivial strategic decision.This is the ﬁrst time that these models are tested versus voting decision with poll infor-mation. In fact, for some of them this is the ﬁrst empirical test at all.Our second contribution is new heuristic voting rule, inspired by a similar model ofBowman et al. [4], that takes into account both the utility of a candidate and its attainability.The

Attainability-Utility ( AU ) decision model outperforms all other decision models we testedin predicting human votes.This contributes to the understanding of the factors that determine people’s strategicvoting and can lead to new theories of voting behavior that combine rational and boundedlyrational behavior. We are not aware of another controlled experiment where voters face multiple strategicdecisions with poll information. Yet, similar experiments were conducted in which groupsof human players voted strategically with dictated preference proﬁles.Closest to our work is a recent paper by Tyszler and Schram [17], who showed thatthe strategic behavior of voters in the lab using is consistent with a quantal best-responseequilibrium. The main diﬀerence is that their subjects played a strategic game versus otherhuman players, and the information they had was the preferences of other voters ratherthan poll information. Similar game-theoretic experiments along that line were conductedin [7, 2, 18]. In particular, these studies have shown that strategic voting in the lab increaseswith the amount of information subjects received about others’ preferences and actions.A diﬀerent line of works in political science compared theoretical models against actualvotes in political elections (using exit polls to obtain the truthful preferences). For example,Blais et al. [3] tested the calculus of voting model on empirical data from political electionsand focused on voter’s decision to vote/abstain. They concluded that the model has someexplanatory power but is far from explaining the data completely, and did not compare toother decision models. In contrast, Abramson et al. [1] concluded that voting behavior inUS primary elections is consistent with the calculus of voting, but observed obvious strategicbehavior only in ∼

13% of the voters, a bit higher than the fraction that seem to vote atrandom.In contrast to controlled experiments, such datasets typically contain few decisions ofeach voter (usually just one), and are this insuﬃcient to test decision models versus indi-vidual behavior.

Preliminaries

In this section we provide the necessary background for our work. An (anonymous) scoreaggregation rule with m candidates C is a function f : N m → C \ ∅ , mapping vectors ofcandidates’ scores to a subset of winning candidates. In particular, the Plurality rule letseach voter vote for a single candidate, collects the total number of votes s ∈ N m , and selects f ( s ) = arg max c ∈ C s ( c ).We consider a single voter who faces a decision, to vote for one of several candidates C .The voter has a cardinal utility function u : C → R , where u i ( c ) is the utility of the voterif candidate c wins (diﬀerent utility for each candidate). The utility of a subset of winners W ⊆ C is u ( W ) = | W | (cid:80) c ∈ W u ( c ). Denote by U ( C ) the set of all utility functions over theset C . We denote by f ( s + c ) the outcome of the score vector s with one additional vote to c . Prior to her vote, the voter is faced with poll information, which is a point estimate ofcandidates scores under the Plurality voting rule. Formally, the poll is a vector s ∈ N m ,where s ( c ) is the number of voters expected to vote for candidate c . There is a jointprobability distribution D ∈ ∆( N m × N m ) over pairs of “real outcomes” and polls. Thevoter is not explicitly informed of this distribution.A decision model (for Plurality with m candidates and a poll) is a function M : U ( C ) × N m → C , where M ( u, s ) ∈ C is the vote of a voter with utility function u , using decisionmodel M given a poll s . We use a superscript for the name of the decision model (e.g., M Truth ), and subscripts to denote voter-speciﬁc parameters, if relevant. We restrict ourattention to deterministic decision models in this work. We demonstrate with two simpleexamples. First, the decision model of a voter who is always truthful regardless of the pollis M Truth ( u, s ) = arg max c ∈ C u ( c ).Next, consider a rational voter that believes the poll to be a completely accurate repre-sentation of the other votes. Such a voter can predict that the outcome of voting c is f ( s + c ),and her decision will be M BR ( u, s ) ∈ arg max c ∈ C u ( f ( s + c )), i.e., a “best response” to thevotes of the other voters (with some assumption on how to vote when there are multiplebest responses).For exposition, we introduce a running example with 5 candidates, and specify whichcandidate the voter will choose under every decision model. Example 1 (Running example) . The set of candidates is C = { q , . . . , q } . A voter’s utilityis described by the vector u = (40 , , , , (preferences are lexicographic). Poll scoresare given by s = (25 , , , , . Figure 1 shows the scores of all candidates graphically. Both M Truth and M BR alwaysselect q . In this section we brieﬂy describe some decision-making models of voting behavior from theliterature, one for each of the approaches speciﬁed above (heuristic, rational, and bounded-rational). In Section 3 we describe our decision model developed for this study, and inSection 4 we provide a detailed comparison of this model to the decision-making modelsbelow as well as several baseline models. A priori, this distribution could be arbitrary, but in most realistic cases there is some correlation betweenthe real score of a candidate and its score in the poll. ecision model vote

Prag , k = 2 q Prag , k = 4 q CV , η = 8 q CV , η = 10000 q LD , r = 0 . q LD , r = 0 . q LD + LB, r = 0 . q LD + LB, r = 0 . q Figure 1: The poll s from Example 1, and the candidate selected by each decision model. k-pragmatist The ﬁrst model we consider is the simple k -pragmatist heuristic [14]. For-mally, let B k ( s ) contain the k candidates with highest score in s , then the pragmatistdecision model with parameter k selects the most preferred candidate among them, i.e., M Prag k ( u, s ) = arg max c ∈ B k ( s ) u ( c ) . We allow k to be an individual parameter that diﬀers from voter to voter.In Figure 1 we see that for k = 2 the voter look only at the two leading candidates ( q and q ), and therefore will vote to the one that is more preferred among them. For k = 4,the voter considers all candidates except q as possible winners, and therefore will vote forhis most preferred candidate q . Calculus of voting

The calculus of voting suggests that a voter always votes in a waythat maximizes her expected utility [15, 12]. The complications of the model usually arisefrom the fact that the voter is assumed to know only the other voters’ preferences , and usesan equilibrium model to predict their votes. However, we consider a simpler version wherethe distribution of votes is given exogenously [11].Recall that we deﬁned D as a joint distribution over actual scores and polls. We denoteby D ( s ) the distribution on the actual scores, conditional on poll scores s . Denote by P s , D ( x, y ) = P r s (cid:48) ∼D ( s ) (cid:20) ( f ( s (cid:48) ) = { x } and f ( s (cid:48) + y ) = { x, y } ) or( f ( s (cid:48) ) = { x, y } and f ( s (cid:48) + y ) = { y } ) (cid:21) , the probability that the voter is pivotal for y versus x when the poll is s . That is, voting for y makes y a joint or unique winner. Then, the voter votes so as to maximize her expectedutility: M CV D ( u, s ) = arg max c ∈ C (cid:88) c (cid:48) (cid:54) = c P s , D ( c (cid:48) , c )( u ( c ) − u ( c (cid:48) )) . To make the CV model concrete, we need to pin it down to a speciﬁc distribution D . Forthis paper, we use the way that scores were generated in the experiment of Tal et al. [16].Speciﬁcally, given poll s of a n -voter population, the actual score vector s (cid:48) is obtained bysampling n votes from a multinomial distribution whose parameters are ( s ( c ) n ) c ∈ C .We use P s ,η as a shorthand for P s , D when D ( s ) is a multinomial distribution with η votersas explained above. When η = n (i.e., the true number of voters), this means that M CV η selects the candidate that exactly maximizes the voter’s expected utility, since P s ,n ( x, y ) isthe true probability that the voter is pivotal.owever, the M CV η decision model allows for a more ﬂexible, bounded-rational decision:when η < n the voter overestimates her true pivot probability, and thus her inﬂuence onthe outcome, whereas η > n means that the voter underestimates her inﬂuence.In Figure 1 we see that for η = 8 the voter believes she is pivotal with suﬃciently highprobability to substantially increase the chance of q to win. However for η = 10007, thevoter believes that any tie except a tie of ( q , q ) is highly improbable, and therefore willvote for q . Local dominance

Under the Local dominance model [10, 9], the voter has an ‘uncertaintyparameter’ r . Given poll s with n participants, the voter considers as possible (withoutassigning any explicit probability) all score vectors s (cid:48) such that | s ( c ) − s (cid:48) ( c ) | ≤ r · n forall c ∈ C . Then, the voter selects an undominated action (i.e., candidate) given this set ofpossible outcomes. Meir et al. [10] characterize the undominated candidates: • Let W be all candidates whose score in s is at least max c ∈ C s ( c ) − rn . • If | W | ≥

2, then the undominated candidates are all candidates in W except the leastpreferred. • If | W | = 1, then all candidates are undominated.Denote by U ( s , u, r ) the set of undominated candidates in poll s for a voter with utility u and parameter r . The decision model of such a voter is M LD r ( u, s ) = arg max c ∈ U ( s ,u,r ) u ( c ) . This assumes that the voter selects the most preferred undominated candidate, if more thanone exists.In Figure 1 we see that for r = 0 .

01 the voter believes that the poll is very accurate (thescore of each candidate may change by at most rn = 3 votes), and there is only one possiblewinner ( W = { q } ). In this case, the voter remains truthful and M LD . ( u, s ) = q . When r = 0 .

08, the voter believes that the poll is not very accurate and hence W = { q , q , q } . We suggest a new heuristic M AU that separately evaluates the attainability (an approx-imation of the success of each candidate according to the poll score) and the utility ofthe candidate given the voter’s preferences. It selects the candidate that maximizes theirweighted geometric mean. The heuristic is partly inspired by a rule that was used in thesimulations of Bowman et al. [4] for multi-issue voting. Given a poll s , we compute the attainability of each voter c similarly to [4]: a β ( c, s ) = 11 + exp( − β · ( s ( c ) n − m )) ∈ [0 , . Then, for some small constant ε >

0, we deﬁne: M AU α,β ( u, s ) = arg max c ∈ C H AU α,β ( u, s , c ) , where H AU α,β ( u, s , c ) = (cid:0) ( ε + u ( c )) α · ( ε + a β ( c, s )) − α (cid:1) . In [4], attainability was computed for each issue separately and there were additional factors such aslearning from the past. All factors were multiplied to obtain the heuristic attractiveness of the candidate. igure 2: Attainability as a function of candidate poll scores for diﬀerent values of β .Intuitively, the α parameter trades-oﬀ the relative importance of attainability and utility,where α = 0 means the voter always selects the candidate with maximal score, and α = 2means the voter is always truthful.The β parameter can be thought of as the accuracy of the poll in the eyes of the voter,similarly to the role of parameter r in the LD model and η in the CV model.Figure 2 shows how β aﬀects the attainability score a β ( s ). Candidates that are tied havethe same attainability. High β means that a small advantage in score translates to a largegap in attainability.Table 1 shows the AU model behavior over Example 1 with diﬀerent parameters. For α close to 2, the voter is tend to be more truthful, however when there is a big gap in votesbetween candidates, β which of the top preferences to choose. There is big gap between q and q , therefore when β is big, the gap will cause a bigger diﬀerence in the score and willcause to vote for q . In contrast When β is small, this gap is less taken into account, andthe voter will vote for q . In contrast, when α is close to 0, the model considers the pollas more important: when β is large, smaller changes in the poll will have more eﬀect onthe decision and therefore only q , q and q have non-negligible attainability; for small β the diﬀerence in poll have less eﬀect and therefore preference 4 was chosen. Notice that nomatter what the parameters are, the model will never choose to vote for q or q , since theyare each dominated by another candidate with higher score and utility. H ( q ) H ( q ) H ( q ) H ( q ) H ( q ) M AU α,β ( u, s ) α = 1 . , β = 30 382.9 q α = 1 . , β = 10 q α = 0 . , β = 30 ≈ ≈ q α = 0 . , β = 10 0.16 0.77 0.11 q Table 1: M AU α,β heuristic score and decision in Example 1, for various parameter values. This is the only case where ε > u ( c ) may be 0. Methodology

Dataset

We evaluated the diﬀerent models on data obtained from Tal et al. [16] whoimplemented several voting scenarios in controlled experiments involving humans. Some ofthis data is publicly available at the link votelib.org . The data was obtained from 595distinct subjects. Each subject played up to 20 rounds of voting with 3 candidates, eachtime with diﬀerent preferences and poll information. The poll provided a noisy indication ofthe results of the voting. The voting instances can be divided into six diﬀerent “scenarios”corresponding to diﬀerent orders of candidates’ scores in the poll once preferences are heldﬁxed (See two leftmost columns in Table 2).We denote the candidate as Q for the most preferred, Q (cid:48) for the second and Q (cid:48)(cid:48) for theleast preferred. The reward was 10 ¢ for each round where Q was elected, 5 ¢ for Q (cid:48) , and 0 ¢ for Q (cid:48)(cid:48) . Note that only in scenarios E and F, where Q is ranked last at the poll, the votermay have a monetary incentive to vote for Q (cid:48) , and never has an incentive to vote for Q (cid:48)(cid:48) . In addition to our decision model M AU α,β , we evaluate the following single-parameter decisionmodels described in Section 2.1: M Prag k , M CV η , M LD r . To these models, we add several otherbaselines. Voter type based model

Tal, Meir and Gal [16] identiﬁed 3 distinct types of voterbehavior, albeit without suggesting an explicit decision model:1. Voters who are always truthful (TRT voters, about 10%-15% of subjects);2. Voters that often compromise when Q is ranked last (CMP voters, about 40% ofsubject), and otherwise vote truthfully;3. Voters that often compromise AND select the leader Q (cid:48) when ranked ﬁrst (LB voters,about 50% of subjects).They also identiﬁed a subgroup of subjects who select unjustiﬁed actions (a candidate c where there is c (cid:48) that is both more preferred and higher-ranked) more than once. Thebehavior of these voters (about 5%-10% of the dataset) is naturally harder to predict forany decision model. We analyze the results for all subjects , but return to the issue ofunjustiﬁed actions and voters in Section 5.3.Based on their distinction of types, we consider the simple TMG decision model M TMG T .The parameter T ∈ { T RT, CM P, LB } is the voter type. It is deﬁned as follows: • M TMG

T RT ( u, s ) = M Truth ( u, s ) = Q ; • M TMG

CMP ( u, s ) = Q (cid:48) if Q is ranked last in s , and Q otherwise; • M TMG LB ( u, s ) = Q (cid:48) if Q (cid:48) is ranked ﬁrst in s , and M TMG LB ( u, s ) = M TMG

CMP ( u, s ) otherwise. Local-Dominance with Leader bias

Note that the ﬁndings of [16] indicate a strongtendency to bias for the leader of the poll, which is not taken into account in the Localdominance model. We thus consider a “leader-biased” variation of the local dominancemodel: M LD + LBr ( u, s ) = M LD r ( u, s ) if | W | ≥ , and otherwise M LD + LBr ( u, s ) = W .

In Figure 1 we see that this model acts similar to the LD model, however when there isonly one possible winner, this model allows the voter to be leader biased and voter for hisfourth preference instead of being truthful. lack-box neural network predictor

Another baseline we used was a general black-boxclassiﬁer. We extracted about 30 relevant features, including the poll scores, the diﬀerencesin poll scores, voter’s utility and the voter type as identiﬁed in [16]. The “decision model” M NN then feeds all features to the classiﬁer, which predicts an action in C .The classiﬁer consisted of a single-hidden-layer feed-forward neural network classiﬁer.The input nodes represented features that summarized voters’ preferences, and the pollinformation that was provided to them, and information about the voter types. The classiﬁerwas implemented using the nnet package of R . Prediction and parameter ﬁtting

The prediction was performed using leave-one-outmethod. For each voter we excluded one of his rounds, one by one. Using the rest of therounds we learned the relevant model parameters and predicted what the voter will do inthe excluded round.

Confusion matrices

The predictions for a speciﬁc decision model result in a confusionmatrix : The entry A ( x, y ) in the matrix speciﬁes how many times the model M AU predicted x and the actual voter action was y (a matrix where all oﬀ-diagonal entries are 0 indicatesperfect prediction). Both rows and columns are sorted Q, Q (cid:48) , Q (cid:48)(cid:48) . An example for a confusionmatrix from out data: confusion matrix: A =   For example, in 441 samples (4.7% of the data), the studied model in the example predicted Q (cid:48) but the voter selected Q . Performance measures

From the confusion matrix we compute standard measures formulti-class prediction problems [13]. These include precision, recall as well as the f-measure,which is the harmonic mean of precision and recall, for every candidate c ∈ C .prec( c ) = A ( c, c ) Col A ( c ) ; recall( c ) = A ( c, c ) Row A ( c ) ; F ( c ) = 2prec( c ) · recall( c )prec( c ) + recall( c ) , where Col A ( c ) = (cid:80) c (cid:48) ∈ C A ( c (cid:48) , c ) , Row A ( c ) = (cid:80) c (cid:48) ∈ C A ( c, c (cid:48) ). Since there are three possibleactions, we calculate a single f-measure by weighting each f-measure by the number of timesthis action was played: F A = 1 (cid:107) A (cid:107) (cid:88) c ∈ C Row A ( c ) F ( c ) . In the example matrix above, prec( Q (cid:48) ) = = 0 . Q (cid:48) ) = = 0 . F ( Q (cid:48) ) = · . · . . . = 0 .

87. For the entire matrix, we wouldget F A = · . · . · . = 0 . https://cran.r-project.org/web/packages/nnet/index.html Results and Analysis

Table 2 shows the f-measure of each decision model. We emphasize that the individualparameters of each voter were learned using leave-one-out to avoid overﬁtting. The resultsare separated to the diﬀerent poll scenarios, as they each reﬂect a diﬀerent strategic decision.The f-measure is also presented graphically in the solid bars shown in Figure 3.scenario frequency

AU LD LD +LB

CV Prag TMG NN A Q > Q (cid:48) > Q (cid:48)(cid:48) (15 . Q > Q (cid:48)(cid:48) > Q (cid:48) (11 . Q (cid:48) > Q > Q (cid:48)(cid:48) (14 . Q (cid:48)(cid:48) > Q > Q (cid:48) (16 . Q (cid:48) > Q (cid:48)(cid:48) > Q (22 . Q (cid:48)(cid:48) > Q (cid:48) > Q (19 . From Table 2 and Figure 3 we can derive the following insights: • In Scenarios A and B, all decision models (except NN ) predict that voters are alwaystruthful, and thus have the same high performance. • In all scenarios C-F, the AU model outperforms all other models. • The “sophisticated” bounded-rational models CV and LD have the worst performance.In particular, they demonstrate poor performance in Scenario C where voters’ decisionis inﬂuenced by leader-bias [16]. • The k -pragmatist heuristics performs surprisingly well, considering its utter simplicityand the fact that it only allows three types of voters (for k = 1 , • The LB variant of local dominance strictly improves its performance, placing it roughlyat par with k -pragmatist and the neural network predictor. • Scenario F is the most diﬃcult one for almost all models, with the best models havingan f-measure slightly above 0 . The data we use to ﬁt the parameters of each voter is sparse. Each voter has at most20 samples, and in some scenarios only 1 or 2 samples (or none at all). Therefore, evenleaving out a single sample may signiﬁcantly hurt performance. In order to ﬁnd what isthe maximum explanatory power of each model, we re-calculated the f-measure for eachmodel using the entire dataset both as a training set and a test set. Clearly this approach issuspect to overﬁtting, so it only provides an upper bound on the prediction ability of eachmodel.These upper bounds appear as stripped bars in Fig. 3. Note that our AU model stilloutperforms all other models when we compare upper bounds (with the slight exceptionof the NN model in Scenarios A and B). The numerical upper bounds can be found in theappendix Table 3.igure 3: The f-measure of each decision model in all scenarios. The solid bars show theprediction performance, whereas the stripped bars show the upper bound on the performanceof each model. Next, we dig deeper into the results. We want to see what kinds of mistakes the AU decisionmodel tends to have. For example, whether these mistakes concentrate on a speciﬁc subsetof subjects or scenarios. These insights could be used later to improve the model, and todesign further experimental evaluation. Errors by poll size

First, the model seems to perform equally well for all poll sizes (seeTable 4 in appendix), even as diﬀerent as n = 8 and n = 10000. We thus conclude that thesize of the poll is not a major factor in explaining the prediction errors. C. ( Q (cid:48) > Q > Q (cid:48)(cid:48) ) D. ( Q (cid:48)(cid:48) > Q > Q (cid:48) ) E. ( Q (cid:48) > Q (cid:48)(cid:48) > Q ) F. ( Q (cid:48)(cid:48) > Q (cid:48) > Q ) 

734 153 0241 465 016 7 0    

480 348 0231 1414 013 42 0  

487 227 74275 606 9378 175 153  Figure 4: Confusion matrix for diﬀerent scenarios.

Errors by type

Figure 4 shows the confusion matrices of the AU decision model in allthe “interesting” scenarios C-F. Recall that all the oﬀ-diagonal entries indicate predictionerrors, where the column is the predicted action ( Q, Q (cid:48) or Q (cid:48)(cid:48) , in that order) and the row isthe action of the subject.As can be seen in the table, most of the prediction errors in Scenario C are due tounder-prediction of voting for the leader Q (cid:48) . In contrast, most of the errors in Scenario Ere due to over prediction of a strategic compromise Q (cid:48) . We can also see why Scenario F isthe hardest, as all three actions are played frequently. Errors by subject

Every decision model can capture the behavior of some human sub-jects better than others. To check how well diﬀerent subjects are predicted, we computedthe confusion matrices and f-measure for each of the 595 subjects, when actions are pre-dicted by M AU . An f-measure of 1 means that all actions of this subject were predictedcorrectly.Figure 5 shows the distribution of subjects’ individual f-measures. We can see that about46% of the subjects are predicted very well (f-measure over 0.9), 29% predicted reasonablywell (f-measure over 0.8), and the rest of the subjects (about 25%) are with f-measure lessthan 0.8.This means that most of the prediction errors are due to a relatively small subset ofsubjects. One possible explanation is that these are the subjects who played fewer games,and therefore it is harder for the model to learn their parameters, however we get a similardistribution after omitting subjects who played under 10 games.The main question is thus whether voters whose behavior is not predicted well follow adiﬀerent decision model than AU , or are somehow inherently unpredictable.Figure 5: A histogram showing, for every f-measure F , how many subjects have f-measureof F . Inherently inconsistent behavior

To answer the above question, we considered typesof behavior that would be ‘inherently unpredictable.’ For example, [16] categorized as“unjustiﬁed” a sample ( s , a ) (action a ∈ C under poll s ) if there is another candidate a (cid:48) that ‘dominates’ the selected action a ( a (cid:48) is more preferred than a and s ( a (cid:48) ) ≥ s ( a )). Theyshowed that voters with at least two unjustiﬁed actions have a random component in theirbehavior.We suggest an additional criterion that is based on inconsistency among a voter’s ownactions. We say that a sample ( s , a ) is inconsistent if there is another sample ( s ∗ , a ∗ ) of thesame voter, such that: (i) a ∗ (cid:54) = a ; (ii) s ∗ ( a ) ≥ s ( a ); and (iii) s ∗ ( a (cid:48) ) ≤ s ( a (cid:48) ) for all a (cid:48) (cid:54) = a . Inwords, a is in a weakly better position in s ∗ , but still the voter prefers to vote for anothercandidate a ∗ .Figure 6 (left) shows the f-measure of all voters, classiﬁed by their consistency type. Wecan see that the left tail of the histogram (i.e., almost all voters with low f-measure) areeither “unjustiﬁed” or “inconsistent.”igure 6: Left : f-measure per voter for voters with more than 10 samples. We divided thepopulation into “unjustiﬁed” voters who played at least two unjustiﬁed actions; “inconsis-tent” voters; and all others.

Right : Prediction accuracy in each scenario, divided by actiontype.This might suggest that perhaps prediction cannot be signiﬁcantly improved. We thustested how many of the prediction errors themselves were of dominated/inconsistent actions.This can be seen in Figure 6 (right). The plaid gray bars represent all prediction errors thatcannot be explained away as dominated or inconsistent actions.We conclude that while most of the prediction errors are indeed due voters that behaveinconsistently sometimes , most of the actual errors are “plaid” especially in scenario F. Thusthere is still room for improvement of our decision models.

It seems that the Attainability-Utility heuristics explains quite well the behavior of mostsubjects in the data, except those with inherent inconsistencies in their replies. To improvethe model we can perform more experiments where we use diﬀerent utilities for the candi-dates (diﬀerent utility gaps, negative utility, etc) and have more than 3 candidates. Thoseexperiments can expose behavior that do not exist in the current data. Interestingly, the NN black-box model sometimes successfully predicts “unjustiﬁed” actions, and we can try tounderstand when is this possible. Since the α and β parameters correspond to the natural cognitive inclinations, their distri-bution can in principle reveal important information on the types of strategic voters in thepopulation. Unfortunately, it is hard to make clear patterns from the parameter distribu-tion. Indeed, it seems that the distribution of β parmeter is bi-modal with peaks at 15 and30 (see appendix). However this is probably an artifact of the way we select parametersfor subjects with a large range of optimal parameters (like truthful voters). More experi-mentation in diﬀerent conditions is required if we want to better understand the populationstructure.One pattern that does stand out is that α values seem to be much higher in the ‘small n ’ condition. This may suggest that not only the relative attainability matters: when n islow and the voter has a substantial chance to be pivotal, the importance of utility increases. eferences [1] Paul R Abramson, John H Aldrich, Phil Paolino, and David W Rohde. sophisticatedvoting in the 1988 presidential primaries. American Political Science Review , 86(1):55–69, 1992.[2] Anna Bassi. Voting systems and strategic manipulation: an experimental study. Tech-nical report, mimeo, 2008.[3] Andr´e Blais, Robert Young, and Miriam Lapp. The calculus of voting: An empiricaltest.

European Journal of Political Research , 37(2):181–201, 2000.[4] Clark Bowman, Jonathan K Hodge, and Ada Yu. The potential of iterative voting tosolve the separability problem in referendum elections.

Theory and decision , 77(1):111–124, 2014.[5] Samir Chopra, Eric Pacuit, and Rohit Parikh. Knowledge-theoretic properties of strate-gic voting. Presented in JELIA-04, Lisbon, Portugal, 2004.[6] Vincent Conitzer, Toby Walsh, and Lirong Xia. Dominating manipulations in votingwith partial information. In

AAAI’11 , pages 638–643, 2011.[7] Robert Forsythe, Thomas Rietz, Roger Myerson, and Robert Weber. An experimentalstudy of voting rules and polls in three candidate elections.

International Journal ofGame Theory , 25(3):355–383, 1996.[8] Umberto Grandi, Andrea Loreggia, Francesca Rossi, Kristen Brent Venable, and TobyWalsh. Restricted manipulation in iterative voting: Condorcet eﬃciency and bordascore. In

ADT’13 , pages 181–192. Springer, 2013.[9] Reshef Meir. Plurality voting under uncertainty. In

AAAI’15 , 2015.[10] Reshef Meir, Omer Lev, and Jeﬀrey S. Rosenschein. A local-dominance theory of votingequilibria. In

ACM-EC’14 , pages 313–330, 2014.[11] Samuel Merrill. Strategic decisions under one-stage multi-candidate voting systems.

Public Choice , 36(1):115–134, 1981.[12] Roger B. Myerson and Robert J. Weber. A theory of voting equilibria.

The AmericanPolitical Science Review , 87(1):102–114, 1993.[13] David Martin Powers. Evaluation: from precision, recall and f-measure to roc, in-formedness, markedness and correlation.

International Journal of Machine LearningTechnology , 2(1):37.[14] Annemieke Reijngoud and Ulle Endriss. Voter response to iterated poll information.In , pages 635–644, 2012.[15] William H Riker and Peter C Ordeshook. A theory of the calculus of voting.

Americanpolitical science review , 62(1):25–42, 1968.[16] Maor Tal, Reshef Meir, and Ya’akov (Kobi) Gal. A study of human behavior in onlinevoting. In

Proceedings of the 2015 International Conference on Autonomous Agents andMultiagent Systems, AAMAS 2015, Istanbul, Turkey, May 4-8, 2015 , pages 665–673,2015. Full version available from https://tinyurl.com/yczxugoj .17] Marcelo Tyszler and Arthur Schram. Information and strategic voting.

Experimentaleconomics , 19(2):360–381, 2016.[18] Karine Van der Straeten, Jean-Fran¸cois Laslier, Nicolas Sauger, and Andr´e Blais.Strategic, sincere, and heuristic voting under four election rules: an experimental study.

Social Choice and Welfare , 35(3):435–472, 2010.

Additional Results scenario

AU LD LD + LB

CV Prag TMG NN A B 0.903 0.903 0.903 0.903 0.903 0.903 0.922C 0.870 0.389 0.819 0.389 0.703 0.693 0.741D 0.784 0.657 0.719 0.657 0.730 0.657 0.733E 0.813 0.676 0.795 0.730 0.766 0.659 0.744F 0.728 0.525 0.609 0.533 0.640 0.411 0.638total 0.841 0.671 0.795 0.708 0.781 0.720 0.784Table 3: Upper bounds on the performance of each model.f-measuretotal scenario C scenario D scenario E scenario F n <

10 0.775 0.747 0.767 0.735 0.534 n ≈

100 0.739 0.710 0.653 0.746 0.588 n ≈ n ≈ n . B Black box classiﬁer

The “decision model” M NN is a single-hidden-layer feed-forward neural network classiﬁer.The model consists of a layer of input nodes that represents the features we use in the dataset,A layer of nodes which is called the ”hidden” layer and an output layer that uses the softmaxactivation function in order to output a classiﬁcation to one of the possible classes in C .The ﬂow of the data is from input to output in one direction and no recurrences occur in theﬂow process as opposed to recurrent neural networks. The input nodes are connected to theoutput nodes via the ”hidden” layer nodes (or neurons) by weighted edges. The algorithmis a supervised learning algorithm that takes a set of class labled records and iterativelylearns and adjusts the weights on the edges by comparing the output of each input recordto it’s class label. The model is iterative and can be updated easily by new records that ithaven’t seen yet. We use a conﬁguration of 3 units in the ”hidden” layer. In our domain weuse a set of vote records that consists of raw and generated features. Some of the generatedfeatures are normalized by the number of votes in the poll conﬁguration. The class label ofeach record is the selected preference of the voter which can be one of { Q, Q (cid:48) , Q (cid:48)(cid:48) } . Usingfeature selection techniques we selected the following features:(a) Poll and preference information:

Candidates poll votes, normalized poll gapsbetween candidates, preference order, the normalized gap between the leader and themost preferred candidate and the scenario which is the combination of the preferenceorder and the poll information.(b)

Voter information:

A-ratios which are the number of rounds the voter selectedaction A divided by the number of rounds it was available. A is the action which weigure 7: A scatter plot ot the α and β parameters of each of the 595 voters.Figure 8: Distribution of the α parameter for n <

10 (left) and n ≥

100 (right).determine using the selected preference (

Q, Q (cid:48) or Q (cid:48)(cid:48) ) and the order of the preferencesin the poll (namely the scenario). We also use the voter type feature which can beone of { TRT, LB, OTHER } and is determined by a threshold values over the A-ratiovalues. For example: if TRT-ratio > . Truthful ).Roy FairsteinBen Gurion UniversityIsraelEmail: [email protected]

Adam LauzBen Gurion UniversityIsraelEmail: [email protected] igure 9: Distribution of the β parameter for n <

10 (left) and n ≥

100 (right).Kobi GalBen Gurion UniversityIsraelEmail: [email protected]