[PDF] Electoral Accountability and Selection with Personalized News Aggregation

Abstract

We study a model of electoral accountability and selection (EAS) in which voters with heterogeneous horizontal preferences pay limited attention to the incumbent's performance using personalized news aggregators. Extreme voters' aggregators exhibit an own-party bias, which hampers their abilities to discern good and bad performances. While this effect alone could undermine EAS, there is a countervailing effect stemming from partisan disagreements, which make the centrist voter pivotal and could potentially enhance EAS. Overall, increasing mass polarization and shrinking attention spans have ambiguous effects on EAS, whereas correlating voters' news signals unambiguously improves EAS and voter welfare.

Full PDF

EElectoral Accountability and Selection with Personalized NewsAggregation

Anqi Li ∗ Lin Hu † Ilya Segal ‡ This Draft: November 2020

Abstract

We study a model of electoral accountability and selection (EAS) in which voterswith heterogeneous horizontal preferences pay limited attention to the incumbent’s per-formance using personalized news aggregators. Extreme voters’ aggregators exhibit anown-party bias, which hampers their abilities to discern good and bad performances.While this eﬀect alone could undermine EAS, there is a countervailing eﬀect stemmingfrom partisan disagreements, which make the centrist voter pivotal and could poten-tially enhance EAS. Overall, increasing mass polarization and shrinking attention spanshave ambiguous eﬀects on EAS, whereas correlating voters’ news signals unambiguouslyimproves EAS and voter welfare. ∗ Department of Economics, Washington University in St. Louis, [email protected]. † Research School of Finance, Actuarial Studies and Statistics, Australian National University,[email protected]. ‡ Department of Economics, Stanford University, [email protected]. a r X i v : . [ ec on . T H ] N ov Introduction

Recently, the idea that tech-enabled news personalization could have signiﬁcant politicalconsequences has been put forward in the academia and popular press (Sunstein (2009);Pariser (2011); Gentzkow (2016); Obama (2017)). This paper studies how personalizednews aggregation for rational inattentive voters aﬀects the society’s ability in motivatingand retaining talented politicians through elections.Our premise is that rational demand for news aggregation in the digital era is drivenby limited attention capacities. As more people get news online where the amount ofavailable information (2.5 quintillion bytes) is vastly greater than what any individual canprocess in a lifetime, consumers must turn to news aggregators for content aggregation,personalized based on their reported preferences and needs, demographic and psychographicattributes, digital footprints, and social network positions. The concern that increasinguse of personalized news aggregators could hamper the society’s ability in holding electedoﬃcials accountable was raised by President Obama, who asserted in his farewell speechthat “For too many of us, it’s become safer to retreat into our own bubbles, especially oursocial media feeds ... and never challenge our assumptions ... How can elected oﬃcialsrage about deﬁcits when we propose to spend money on preschool for kids, but not whenwe’re cutting taxes for corporations? How do we excuse ethical lapses in our own party,but pounce when the other party does the same thing? ... this selective sorting of facts; itis self-defeating.”In this paper, we study how personalized news aggregation for rational voters withlimited attention capacities aﬀects electoral accountability and selection (hereafter EAS).Our analysis is carried out in a standard model of policymaking and election. At the outset,a candidate ( R ) assumes oﬃce and privately observes his ability, which is either high or low. News aggregators (e.g., aggregator sites, feed reader apps) operate by sifting through myriad onlinesources and directing readers to the stories they might ﬁnd interesting. They have recently gained prominenceas more people get news online, from social media and through mobile devices (Matsa and Lu (2016)).The top three popular news websites in 2019: Yahoo! News, Google News and Huﬃngton Post, are allaggregators. See Athey and Mobius (2012), Athey Mobius, and Pal (2017), and Chiou and Tucker (2017)for background readings and literature surveys. L ). Voting is expressive and forward-looking , i.e., each voter votes for the candidate whogenerates the highest expected future payoﬀ to him. The election outcome is determinedby simple majority rule and the winner earns an oﬃce rent.To create a role for personalized news aggregation, we depart from the representativevoter paradigm and work instead with multiple voters with heterogeneous horizontal prefer-ences. A voter’s valuation of the incumbent relative to the challenger equals his horizontalpreference parameter plus the diﬀerence in the two candidates’ abilities. Before the electiontakes place, voters build personalized news aggregators , modeled as signal structures thataggregate raw performance data into news content. Consuming news improves expressivevoting decisions while incurring an attention cost that is posterior separable (Caplin andDean (2013); Caplin, Dean, and Leahy (2019)). Voters have limited bandwidths but areotherwise free to specify any signal structure, due to the ﬂexibility in assembling RSS feedreaders themselves or the freedom to choose between multiple platforms that compete forcustomers’ attention and eyeballs. A personalized news aggregator maximizes a voter’sexpressive voting utility, subject to his bandwidth constraint.Personalized news exhausts voters’ bandwidths with binary recommendations as towhich candidate to vote for. Indeed, any information beyond voting recommendationswould only raise the attention cost without any corresponding beneﬁt and is thus wasteful.Moreover, a voter must strictly obey the voting recommendations given to him, because ifhe has a (weakly) preferred candidate that is independent of his voting recommendations,then he could obtain the same expressive voting utility without paying attention, let aloneexhaust his bandwidth.We consider a symmetric environment featuring a left-wing voter, a centrist voter, anda right-wing voter. While the personalized news for the centrist voter is unbiased, that ofextreme voters exhibits an own-party bias , i.e., recommend the voter’s own-party candidate3ore often than his opposite-party candidate. Since an extreme voter would always votealong party lines without paying attention, paying attention is only useful if it sometimesconvinces the voter to vote across party lines. The corresponding voting recommenda-tion must be very strong and, in order to stay within the voter’s bandwidth limit, mustalso be very rare (hereafter occasional big surprise ), implying that the recommendationis to vote for the voter’s own-party candidate most of the time. Evidence for own-partybias and occasional big surprise has been documented in the empirical literature (Fiorinaand Abrams (2008); DellaVigna and Gentzkow (2010); Flaxman, Goel, and Rao (2016);Gentzkow (2016)).To illustrate how personalized news aggregation could aﬀect EAS, suppose voters’ pop-ulation distribution is suﬃciently dispersed that each voter is pivotal with a positive prob-ability. Consider two events. In the ﬁrst event, extreme voters agree on which candidateto vote for, so the incentive power generated by their news aggregators (i.e., the ability todiscern good and bad performances) determines the society’s ability to uphold EAS. In thesecond event, extreme voters disagree about which candidate to vote for, so the centristvoter pivotal and contributes to EAS through the incentive power generated by his newsaggregator. In recent years, the disagreement between extreme voters, or partisan disagree-ment , has risen sharply across a wide range of issues such as abortion, global warming, gunpolicy, immigration, and same-sex marriage (Fiorina and Abrams (2008); Gentzkow (2016);Carroll, Kiley, and Asheer (2019)).Our comparative statics results exploit the trade-oﬀ between the incentive power andpartisan disagreement generated by extreme voters’ news aggregators. We ﬁrst ﬁnd thatincreasing extreme voters’ horizontal preference parameter magniﬁes their own-party biasesand reduces the incentive power generated by their news aggregators. While this eﬀect alonewould undermine EAS, there is a countervailing eﬀect stemming from the partisan disagree-ment, which arises more frequently as extreme voters become more partisan. The combinedeﬀect on EAS could be positive if the centrist voter’s news signal is very informative aboutthe incumbent’s performance. Likewise, while reducing in extreme voters’ bandwidths un-4ermines the incentive power generated by their news aggregators, it could also magnifypartisan disagreements and, in turn, enhance EAS. Together, these results paint a nu-anced picture where factors that usually carry a negative connotation in everyday politicaldiscourse—such as increasing mass polarization (Fiorina and Abrams (2008)) and shrinkingattention spans (Teixeira (2014); Dunaway (2016); Prior (2017))—could prove conduciveto EAS, whereas well-intentioned attempts by Allsides.com to battle the rising polarizationthrough feeding extreme voters with unbiased viewpoints could undermine EAS. Interest-ingly, correlating voters’ news signals, if done appropriately, unambiguously improves EASwithout aﬀecting voters’ expressive voting utilities, suggesting that well-conceived coordi-nation, if not consolidation between news aggregator sites could enhance voter welfare. Rational inattention

The growing literature on rational inattention (hereafter RI) pi-oneered by Sims (1998) and Sims (2003) has recently been surveyed by Caplin (2016) andMa´ckowiak, Matˇejka, and Wiederholt (2018). The current analysis is carried out in astandard setting where decision-makers can aggregate source data into any signal throughpaying a posterior-separable attention cost. The ﬂexibility of RI information aggregationplays a crucial role here, as well as in recent studies of electoral competition with RI voters(Matˇejka and Tabellini (2016); Hu, Li, and Segal (2019)). Posterior separability has re-cently received attention from economists because of its axiomatic and revealed-preferencefoundations (Caplin and Dean (2015); Zhong (2017); Denti (2018); Tsakas (2019)), connec-tions to sequential learning (H´ebert and Woodford (2017); Morris and Strack (2017)), andvalidations by lab experiments (Ambuehl, Ockenfels, and Stewart (2019); Dean and Nelighz(2019)). Flexibility is absent from models of rational ignorance , a termed coined by Downs (1957) and morerecently used by political scientists to refer to rigid information acquisition, e.g., drawing a signal from agiven probability distribution. edia bias The literature on media bias and its political consequences is thoroughlysurveyed by Prat and Str¨omberg (2013), Str¨omberg (2015), and Anderson, Str¨omberg, andJ. Waldfogel (2016). The idea that even rational consumers can exhibit a preference forbiased news when constrained by information processing capacities dates back to Calvert(1985) and is later expanded on by Burke (2008), Suen (2004), and Che and Mierendorﬀ(2019). However, these authors consider non-RI information aggregation technologies anddo not examine the consequence of news bias for EAS. In Hu, Li, and Segal (2019), newsis aggregated by a monopolistic infomediary who maximizes voters’ attention rather thantheir utilities. Here, voters can aggregate news optimally as in the standard RI paradigm.In political science, the term own-party bias refers to the positive correlation between aperson’s party aﬃliation and his propensity to support his own-party candidate. The pastdecade has witnessed a sharp rise in the own-party bias without signiﬁcant changes in voters’intrinsic political preferences (Fiorina and Abrams (2008); Gentzkow (2016)), a trend thatcould persist due to personalized news aggregation. Occasional big surprise is a hallmarkof Bayesian rationality, and its evidence is surveyed by DellaVigna and Gentzkow (2010). Recently, Flaxman, Goel, and Rao (2016) ﬁnd that using news aggregators increases one’sown-party bias, as well as his opinion intensity when supporting his opposite-party candidate(i.e., occasional big surprise).

Electoral accountability

The literature on electoral accountability is surveyed by Ash-worth (2012) and Duggan and Martinelli (2017). We enrich the existing theoretical insightsinto how voters’ information and exposure to biased media aﬀect electoral accountability,most of which are derived from studying a representative voter facing exogenous informa-tion/media environments (Adachi and Hizen (2014); Ashworth and de Mesquita (2014); Even if they did, their results could diﬀer signiﬁcantly from ours. Take the information aggregationtechnology considered by Suen (2004), which partitions realizations of a continuous state variable into twocells. Since the signal realizations it generates are monotone in voters’ horizontal preferences (i.e., if a left-wing voter is recommended to vote for candidate R , then a right-wing voter must receive the same votingrecommendation), the median voter’s signal determines the election outcome despite a plurality of votersand media. In contrast, the non-Bayesian model of Mullainathan and Shleifer (2005) predicts a conﬁrmatory biasbut not any occasional big surprise. Egorov (2009) and Prat and Str¨omberg (2013) study models in which hetero-geneous voters care about diﬀerent aspects of the incumbent’ policy and receive exogenousinformation thereof. However, Egorov (2009) studies a pure moral hazard problem withoutadverse selection, whereas Prat and Str¨omberg (2013) considers a diﬀerent setting from oursin which equilibrium policy depends on voter information through the fraction of informedmembers within each voter group.

Common agency

The literature on common agency games with moral hazard was pi-oneered by Bernheim and Whinston (1986) and later generalized by Peters (2001) amongothers. We examine a new special case of such games in which the allocation proposedby principals (voters) to the agent (incumbent) constitutes a proﬁle of news aggregators.Khalil, Martimort, and Parigic (2007) and Gailmard (2009) also study common agencygames with endogenous monitoring decisions, though their principals are homogeneous andtheir monitoring technologies rigid.

There is an incumbent R , a challenger L , and three voters k ∈ K = {− , , } of a unitmass. Voter k ’s population is f k ∈ (0 , v k ∈ ( − , θ , which is either high or low and has zero mean.A high-ability incumbent can exert high eﬀort a = 1 at a cost c > a = 0 atno cost, whereas a low-ability incumbent can only exert low eﬀort. The incumbent’s eﬀortchoice a ∈ { , } is his private information, and it generates performance data modeled See also Dewatripont, Jewitt, and Tirole (1999) for the role of information in career-concern models. Existing studies on pandering consider information environments in which the incumbent signals com-petence through bad reputation building (see Canes-Wrone, Herron, and Shotts (2001), Maskin and Tirole(2004), and Prat (2005) among others). Here reputation is good.

7s a binary random variable with full support Ω = {− , } and p.m.f. p a . After that, anelection takes place, in which voters decide whether to re-elect the incumbent or to replacehim with the challenger. Compared to the second outcome, the ﬁrst outcome generates apayoﬀ diﬀerence v k + θ to voter k that equals his horizontal preference parameter plus theincumbent’s ability relative to the challenger (normalize the challenger’s expected ability tozero). Voting is expressive , meaning that each voter votes for the candidate who generatesthe highest expected payoﬀ to him. The election outcome is determined by simple majorityrule with ties broken in favor of the incumbent, and the winning candidate earns one unitof oﬃce rent.Before the election takes place, voters build and use personalized news aggregators , whichaggregate performance data into news content that is easy to process and useful for decision-making. A news aggregator for voter k is a ﬁnite signal structure Π k : Ω → ∆ ( Z ), whereeach Π k ( · | ω ) speciﬁes a probability distribution over a ﬁnite set Z of signal realizationsconditional on the incumbent’s performance state being ω ∈ Ω. Absorbing the contentgenerated by Π k requires that the voter pays an amount I (Π k ) of attention that mustn’texceed his bandwidth I k >

0. After that, the voter observes the signal realization, updateshis belief about the incumbent’s ability, and casts his vote.The game sequence is as follows.1. The incumbent observes his ability, makes an eﬀort choice, and generates performancedata.2. Voters build personalized news aggregators without observing the moves in Stage 1.3. Voters pay attention, observe the signal realizations generated by their news aggrega-tors, and cast votes.We consider a symmetric environment in which v − k = − v k , f − k = f k , and I − k = I k ∀ k ∈ K . Voter − , , v k ) = sgn ( k ), andthey are called left-wing , centrist , and right-wing , respectively. To model news aggregationas a decentralized and uncoordinated activity among voters, we assume that the signals8enerated by diﬀerent voters’ news aggregators are conditionally independent for any given ω ∈ Ω (Section 4 relaxes this assumption).Our solution concept is perfect Bayesian equilibrium . We say that an equilibrium sus-tains accountability if the high-ability incumbent exerts high eﬀort, and that accountabilityis sustainable if it can be sustained in an equilibrium. We also deﬁne equilibrium selection as the expected ability of the elected oﬃcial at the end of game, which measures the soci-ety’s ability in retaining high-ability incumbents and replacing low-ability incumbents withchallengers. Our main research question concerns how personalized news aggregation aﬀectsequilibrium accountability and selection (EAS). Throughout the analysis, we assume thatwhenever the incumbent is indiﬀerent between high and low eﬀorts, he exerts high eﬀort.

Performance data

We model the incumbent’s performance data (e.g., economic indica-tors) as a binary random variable (see Appendix B for an extension to continuous randomvariables). Since the sole use of performance data is to make inferences about the incum-bent’s ability, it is w.l.o.g. to identify each performance state ω ∈ Ω with the incumbent’sconditional expected ability E [ θ | ω, a h = 1] in that state, where a h = 1 indicates thatthe expectation is taken under the assumption that the high-ability incumbent exerts higheﬀort.Assuming Ω = {− , } has three consequences. First, we can interpret ω = 1 as goodperformance and ω = − bad performance , since high eﬀort is more likely to generate ω = 1 rather than ω = − p (1) /p ( − < p (1) /p ( − a h = 1, each ω ∈ Ω must occur with . E [ θ ] = 0. Finally, since | v k | ∈ (0 , ∀ k ∈ K , all voters would vote for the incumbent(resp. challenger) if ω = 1 (resp. ω = −

1) had ω been perfectly observable, so horizontalpreferences alone do not generate biased voting decisions. If | v k | > k = ±

1, then extreme voters would always vote along party lines. In that case, the centristvoter’s vote determines the election outcome, which brings us back to the representative voter paradigm. ews aggregator A news aggregator Π : Ω → ∆ ( Z ) generates a distribution overcontent that shapes voter’s belief about the incumbent’s performance (and hence his ability).In the case where a h = 1, the aggregator outputs z ∈ Z with probability π z = (cid:88) ω ∈ Ω Π ( z | ω ) · , which is assumed to be strictly positive for all z ∈ Z . Then µ z = (cid:88) ω ∈ Ω ω · Π ( z | ω ) / (2 π z )is the posterior mean of the incumbent’s performance conditional on the signal realizationbeing z , and the tuple (cid:104) π z , µ z (cid:105) z ∈Z fully summarizes the distribution of the posterior beliefsthat one holds after consuming the content generated by Π. Attention

We model attention as a scarce resource that reduces voters’ uncertaintiesabout the incumbent’s performance:

Assumption 1.

In the case where a h = 1 , the needed amount of attention for absorbingthe content generated by Π : Ω → ∆ ( Z ) is I (Π) = (cid:88) z ∈Z π z · h ( µ z ) (1) where the function h : [ − , → R + is continuous on [ − , , twice diﬀerentiable on ( − , ,and satisﬁes the following properties: (i) strict convexity on [ − , and h (0) = 0 ; (ii)symmetry around zero; (iii) h (1) < I k ∀ k ∈ K . Equation (1) coupled with Assumption 1(i) is equivalent to weak posterior separability (WPS), a notion proposed by Caplin and Dean (2013) to generalize Shannon entropy asa measure of attention cost. In the current context, WPS stipulates that processing nullsignals is costless and that more attention is needed for moving posterior beliefs closer to This assumption is equivalent to ∀ k ∈ K : Π( z | ω ) > ω ∈ Ω. Assumption1(ii) and (iii) impose regularities on our problem. Assumption 1(ii) states that only themagnitude of the posterior mean could aﬀect the attention cost whereas its sign (whichindicates the direction of belief updating) couldn’t. It is made to simplify our analysis, anda slight departure from it wouldn’t aﬀect our qualitative predictions. Assumption 1(iii)says that the needed amount of attention for absorbing the performance data without errorexceeds voters’ bandwidths, hence voters must garble performance data in order to staywithin their bandwidth limits.

Personalized news aggregator

We share the view of Prat and Str¨omberg (2013) thatinstrumental voting is an important motive for consuming political news. Our voters havelimited bandwidths but can otherwise specify any signal structure, due to the ﬂexibility inbuilding RSS feed readers themselves or the freedom to choose between multiple platformsthat compete for customers’ attention and eyeballs. They cannot commit to signal struc-tures before the incumbent moves, because changes in the algorithms behind modern newsaggregators cannot be easily detected by third parties (Eslami et al. (2015)). A personalizednews aggregator for a voter maximizes his expressive voting utility, subject to his bandwidth Various foundations for posterior separability have been proposed since Shannon (1948), who allowsvoters to ask a series of yes-or-no questions about the performance state at a constant marginal cost. Themore questions a voter asks, the more precise his posterior belief is about the performance state. Accordingto Shannon (1948), the minimal average number of questions that needs to be asked in order to implementa signal structure equals approximately the mutual information between the source data and output signal.More recently, H´ebert and Woodford (2017) and Morris and Strack (2017) study optimal stopping problemsin which a decision maker consults one of the many sources sequentially and incurs a (time and belief-dependent) ﬂow cost until the process is randomly terminated. These authors provide general conditionsunder which the expected total cost is posterior separable in the continuous-time limit. WPS and symmetry are satisﬁed by many commonly used attention cost functions, e.g., h ( µ ) = µ and h ( µ ) = H ((1 + µ ) /

2) ( H denotes the binary entropy function), in which cases I (Π) equals the reductionsin the variance and Shannon entropy of the performance state, respectively, before and after one consumesthe content generated by Π. See also Chan and Suen (2008), Matˇejka and Tabellini (2016), Perego and Yuksel (2018), and Prat(2018) for political models in which voters devote limited resources solely to the consumption of news thatimproves expressive voting decisions. A major revenue source for modern news aggregators is displaying advertising. Ad revenue increases withthe amount of attention paid by customers, because the more informational content a customer absorbs,the longer he stays on the platform and so is exposed to more ads. Click here for the tactics deployedby Facebook to grab users’ attention, e.g., playing short mid-roll ads when users are in the “lean-back”reading/watching model.

This section examines the eﬀect of personalized news aggregation on EAS. Two remarksbefore we proceed. First, all results except that in Section 3.1 assume Assumption 1.Second, it is easy to verify that whenever accountability is sustainable, there is a uniqueequilibrium that sustains accountability.

In this section, we lift voters’ bandwidth constraints so that they can absorb the performancedata without error. The next lemma characterizes the EAS in this benchmark case.

Lemma 1.

Suppose I k ≥ h (1) ∀ k ∈ K . Then in the case where a h = 1 , all voters vote forthe incumbent if ω = 1 and vote for the challenger if ω = − . Accountability is sustainableif and only if ≥ ˆ c , where is the diﬀerence in the probabilities that the incumbent winsre-election given ω = 1 and ω = − , respectively, and ˆ c = cp (1) − p (1) represents a threshold that must be crossed in order to sustain accountability in equilibrium.Equilibrium selection equals / with accountability and zero without accountability. In this section, we bring back voters’ bandwidth constraints and solve for their personalizednews aggregators in the case where a h = 1. Since extreme voters would always vote alongparty lines without paying attention, they only pay attention in order to be (sometimes)convinced to vote across party lines. After consuming the content generated by a news If instead a h = 0, then the incumbent’s performance data are uninformative about his ability and henceare ignored by voters. k strictly prefers candidate R to L if v k + µ z >

0, and he strictly preferscandidate L to R if v k + µ z <

0. Thus, his expected utility gain from using Π is V k (Π) = (cid:88) z ∈Z π z · ν k ( µ z )where ν k ( µ z ) =  [ v k + µ z ] + if k ≤ , − [ v k + µ z ] − if k > , and a personalized news aggregator for him solvesmax Z , Π:Ω → ∆( Z ) V k (Π) s.t. I (Π) ≤ I k . The lemma gives preliminary characterizations for personalized new aggregators; itsintuition was already discussed in Section 1.

Lemma 2.

In the case where a h = 1 , the personalized news aggregator Π k for any voter k ∈ K is unique, exhausts his bandwidth, and prescribes voting recommendations that hestrictly obeys, i.e., I (Π k ) = I k , Z k = { L, R } , and v k + µ L,k < < v k + µ R,k . Hereafter we shall focus on news aggregators that prescribe voting recommendations totheir users. Such an aggregator is said to be neutral if it recommends both candidates withequal probability, L -biased if it recommends candidate L more often than R , and R -biasedif it recommends candidates R more often than L . The next lemma characterizes the biasesof personalized news aggregators; its intuition was already discussed in Section 1. Lemma 3.

In the case where a h = 1 , the centrist voter’s news aggregator Π is neutral,whereas extreme voters’ news aggregators Π − and Π exhibit an own-party bias: they rec-ommend voters’ own-party candidates more often than their opposite-party candidates, andthey do so more often as v increases. To see how personalized news aggregation aﬀects aggregate outcomes, we introducetwo concepts. The ﬁrst concept: the incentive power generated by a personalized news13ggregator Π k , is deﬁned as the diﬀerential probability that Π k recommends the incumbentfor re-election in the good and bad performance states, respectively: P k = Π k ( R | ω = 1) − Π k ( R | ω = − . Intuitively, P k captures voter k ’s ability in discerning good and bad performances and hencehis ability in holding the incumbent accountable in the representative voter paradigm.The second concept: partisan disagreement , is deﬁned as the probability that extremereceive conﬂicting voting recommendations and so disagree about which candidate to votefor: D = P Π ± ( z (cid:54) = z − | a h = 1) . The sharp rise in partisan disagreement in recent years has been documented by manyscholars. In the current setting, D is strictly bounded away from one because extremevoters’ news aggregators do recommend their opposite-party candidates with positive, albeitsmall probabilities (see Section 1 for intuition and evidence). Such recommendations, whichcould lead suburban Republicans to vote against Trump or working class people to vote forTrump, play an implicit role in the upcoming analysis.The next lemma demonstrates the tension between P k s and D as we vary model primi-tives. Lemma 4. (i) P − and P are decreasing in v , whereas D is increasing in v . (ii) P k isincreasing in I k , whereas D is in general non-monotonic in I . Lemma 4(i) holds because as extreme voters become more partisan, their news aggre-gators endorse their own-party candidates more often regardless of the incumbent’s perfor-mance. As a result, the incentive power generated by their news aggregators decreases andpartisan disagreements arise more frequently than before.Lemma 4(ii) shows that as voters’ bandwidths increase, their news aggregators becomemore Blackwell informative and so generate more incentive power individually. Partisan14isagreement can either increase or decrease, and it decreases if, due to the ﬂexibility ofattention allocation, extreme voters increase support for their own-party candidates signif-icantly when data are favorable but are reluctant to cut back their support when data areunfavorable. In this section, we take the personalized news aggregators solved in the previous sectionas given and verify whether they can sustain accountability in equilibrium or not. We alsosolve for the degree of equilibrium selection with and without accountability.Our analysis exploits a key concept called the societal incentive power , formally deﬁnedas the diﬀerential probability that the incumbent wins re-election in the good and bad per-formance states, respectively, in the case where voters use the personalized news aggregatorssolved in the previous section: ξ = P Π k s ( R wins re-election | ω = 1) − P Π k s ( R wins re-election | ω = − . Intuitively, ξ captures the society’s ability in upholding EAS through rewarding good per-formances and punishing bad performances. It equals one in the benchmark model but isstrictly less than one here due to the bandwidth constraint. The next theorem expressesEAS as functions ξ and reduces ξ to model primitives. Theorem 1.

Accountability is sustainable if and only if ξ ≥ ˆ c , where ξ =  P if f ≥ / ,P + DP if f < / . Equilibrium selection equals ξ/ with accountability and zero without accountability. We distinguish between two cases: f ≥ / f < /

2. In the ﬁrst case, voter Condition (22) in Appendix A.2, which is suﬃcient and necessary for D to increase with I , exploits thecurvature of the function h . ξ increases, the society as a whole becomes better at rewarding good performancesand punishing bad performances, and accountability becomes sustainable at the momentwhen ξ crosses the threshold ˆ c from below. As for selection, notice that without account-ability, voters pay no attention to the incumbent’s performance, which makes selectionimpossible. With accountability, equilibrium selection is proportional to ξ , which justiﬁesthe use of ξ to measure both equilibrium accountability and selection (EAS). This section examines the comparative statics of societal incentive power. The next propo-sition concerns the eﬀect of changing voters’ horizontal preference parameter.

Proposition 1.

In the case where f < / , ξ is in general non-monotonic in v . As extreme voters become more partisan, their news aggregators become more biasedand generate less incentive power individually by Lemma 4. While this eﬀect alone wouldmake accountability harder to sustain, there is a countervailing eﬀect stemming from thedisagreement between extreme voters, which arises more frequently as they become morepartisan. In case of disagreement, the centrist voter is pivotal, and his contribution to16he societal incentive power increases with his bandwidth. In one extreme situation where I ≈

0, the centrist voter’s contribution is negligible, i.e., P ≈

0, so ξ ≈ P and is decreasingin v by Lemma 4. In the opposite situation where I is large, the centrist voter can absorbthe performance data with few errors, i.e., P ≈

1, so ξ ≈ P + D and is maximizedwhen v is close to one (as shown in Appendix A.3). For in-between cases, ξ can varynon-monotonically with v as illustrated by Example 1 in Appendix A.3.In recent years, news aggregators such as Allsides.com have been built to battle the risingpolarization through feeding extreme voters with unbiased news. The current analysis warnsthat caution must be exercised when evaluating the consequences of adopting these newsaggregators on a large scale. In terms of its eﬀect on EAS, feeding extreme voters withunbiased news is mathematically equivalent to reducing v , which by Proposition 1 couldmake EAS harder, not easier to sustain. In addition, extreme voters prefer to consumebiased news by Lemma 3, so forcing news to be unbiased could reduce their expressivevoting utilities and hence welfare.The next proposition examines the eﬀect of changing voters’ bandwidths on the societalincentive power. Proposition 2.

In the case where f < / , ξ is increasing in I but in general non-monotonic in I . The level of political knowledge among ordinary citizens is viewed an important de-terminant for how well elected oﬃcials can be held accountable. Recently, scholars andpundits have voiced growing concerns over people’s shrinking attention spans caused byan overabundance of entertainment, the advent of the Internet and mobile devices, andthe increased competition between ﬁrms for consumer eyeballs (Teixeira (2014); Dunaway(2016); Prior (2017)). Propositions 2 paints a rosier picture. As a voter’s bandwidth de-creases, his news aggregator generates less incentive power, which would undermine EASin the representative voter paradigm. With multiple voters and personalized news aggrega-tion, there is an additional eﬀect stemming from partisan disagreements, which by Lemma4 is non-monotonic in extreme voters’ bandwidths. Thus while a reduction in the centrist17oter’s bandwidth unambiguously undermines EAS, nothing as clear-cut can be said aboutextreme voters’ bandwidths.The next proposition examines the eﬀect of changing voters’ population distribution onthe societal incentive power.

Proposition 3.

Let ξ and ξ (cid:48) be the societal incentive power obtained under two populationdistributions f and f (cid:48) where f ≥ / > f (cid:48) . Then ξ (cid:48) − ξ = P − (1 − D ) P is decreasing in I but is in general non-monotonic in v and I . Recently, a growing body of the literature has been devoted to understanding voter po-larization (also termed mass polarization ). Notably, Fiorina and Abrams (2008) deﬁne masspolarization as a bimodal distribution of voters’ preferences on a liberal-conservative scale,and Gentzkow (2016) advocates using the average ideological distance between Democratsand Republicans to measure mass polarization. Inspired by these authors, we deﬁne in-creasing mass polarization as a mean-preserving spread to voters’ horizontal preferences.Proposition 3 suggests that increasing mass polarization could prove conducive to EAS de-spite its negative connotation in everyday political discourse. As we keep redistributingvoters’ populations from the center to the margin, extreme voters will eventually becomepivotal with positive probabilities. At that moment, they start to contribute to EAS,whereas the centrist voter’s contribution is discounted by the probability that a consensusis reached among extreme voters as to which candidate to vote for. The overall eﬀect on EASis ambiguous and, in particular, non-monotonic in extreme voters’ horizontal preferencesand bandwidths by Lemma 4.

Correlated news aggregation

So far we’ve restricted news signals to be conditionallyindependent across voters. Correlating the news signals among voters, if done appropriately, We are not arguing that mass polarization is on the rise (existing evidence is mixed) but are merelyexamining its consequence on EAS. Future work

Recently, tech-enabled personalization has become the subject of manyhotly debated regulatory proposals (The General Data Protection Regulation (2016); War-ren (2019)). In our opinion, an essential step in evaluating the eﬀectiveness of these pro-posals is to understand their consequences on politics. This paper examines the EAS eﬀectof personalized news aggregation for RI voters. Our model is a simple one (Appendix 3.4investigates an extension to a continuum of performance states), and we hope to enrichit with multidimensional policy eﬀorts and richer voter groups in the future. As for howto test our theory, we believe that an essential ﬁrst step is to test the model of posteriorseparability and, more broadly, to study how people pay attention in the political context.We also believe in the usefulness of studying the experiments conducted by tech companiesor the regulatory uncertainties they face, as they could potentially generate the exogenousvariations that are needed for empirical researches. We hope someone, maybe us, will pursuethese agendas in the future. To pin down the extent of improvement, one needs to solve a linear programming problem where themaximand is ξ , the choice variable is the joint news distribution, and the constraint is that marginal newsdistributions must equal those generated by personalized news aggregators. The problem is computationalin nature and hence is not pursued here. eferences Adachi, T., and Y. Hizen. (2014): “Political accountability, electoral control, and mediabias,”

Japanese Economic Review , 65(3), 316-343.

Ambuehl, S., A. Ockenfels, and C. Stewart. (2019): “Attention and selection ef-fects,”

Working Paper . Anderson, S. P., D., Str¨omberg, and J. Waldfogel, eds. (2016):

Handbook of MediaEconomics,

Elsevier.

Ashworth, S. (2012): “Electoral accountability: Recent theoretical and empirical work,”

Annual Review of Political Science , 15, 183-201. ——— (2014): “Is voter competence good for voters?: Information, rationality, and demo-cratic performance,”

American Political Science Review , 108(3), 565-587.

Ashworth, S., E. B. de Mesquita, and A. Friedenberg. (2016): “Accountability andinformation in elections,”

American Economic Journal: Microeconomics , 9(2), 95-138.

Athey, S., and M. Mobius. (2012): “The impact of aggregators on Internet news con-sumption: The case of localization,”

Working Paper.

Athey, S., M. Mobius, and J. Pal. (2017): “The Impact of aggregators on Internetnews consumption,”

Working Paper.

Bernheim, B. D., and M. D. Whinston. (1986): “Common agency,”

Econometrica ,54(4), 923-942.

Besley, T., and A. Prat. (2006): “Handcuﬀs for the grabbing hand? Media captureand government accountability,”

American Economic Review , 96(3), 720-736.

Burke, J. (2008): “Primetime spin: Media bias and belief conﬁrming information,”

Jour-nal of Economics and Management Strategy , 17(3), 633-665.20 alvert, R. L. (1985): “The value of biased information: A rational choice model ofpolitical advice,”

Journal of Politics , 47(2), 530-555.

Caplin, A. (2016): “Measuring and modeling attention,”

Annual Review of Economics , 8,379-403.

Caplin, A., and M. Dean. (2013): “Behavioral implications of rational inattention withShannon entropy,”

NBER working paper . ——— (2015): “Revealed preference, rational inattention, and costly information acquisi-tion,” American Economic Review , 105(7), 2183-2203.

Caplin, A., M. Dean, and J. Leahy. (2019): “Rationally inattentive behavior: Char-acterizing and generalizing Shannon entropy,”

Working Paper . Carroll D, J. Kiley, and N. Asheer. (2019): “In a politically polarized era, sharpdivides in both partisan coalitions,”

Pew Research Center , December 17.

Chan, J., and W. Suen. (2008): “A spatial theory of news consumption and electoralcompetition,”

Review of Economic Studies , 75(3), 699-728.

Che, Y-K., and K. Mierendorff. (2019): “Optimal dynamic allocation of attention,”

American Economic Review , 109(8), 2993-3029.

Chiang, C-F., and B. Knight. (2011): “Media bias and inﬂuence: evidence from news-paper endorsements,”

Review of Economic Studies , 78(3), 795-820.

Chiou, L., and C. Tucker. (2017): “Content aggregation by platforms: The case of thenews media,”

Journal of Economics and Management Strategy , 26, 782-805.

Canes-Wrone, B., M. C. Herron, and K. W. Shotts. (2001): “Leadership andpandering: A theory of executive policymaking,”

American Journal of Political Science ,45(3), 532-550. 21 ean, M., and N. Nelighz. (2019): “Experimental tests of rational inattention,”

WorkingPaper . DellaVigna, S., and M. Gentzkow. (2010): “Persuasion: Empirical evidence,”

AnnualReview of Economics , 2, 643-669.

Denti, T. (2018): “Posterior-separable cost of information,”

Working Paper . Dewartripont, M., I. Jewitt, and J. Tirole (1999): “The economics of career con-cerns, Part I: Comparing information structures,”

Review of Economic Studies , 66(1),183-198.

Downs, A. (1957):

An Economic Theory of Democracy,

New York, NY: Harper and Row,1st ed.

Dunaway, J. (2016): “Mobile vs. computer: Implications for news audiences and outlets,”

Shorenstein Center on Media, Politics and Public Policy , August 30.

Duggan, J., and C. Martinelli. (2017): “The political economy of dynamic elections:Accountability, commitment, and responsiveness,”

Journal of Economic Literature , 55(3),916-984.

Egorov, G. (2009): “Political accountability under special interest politics,”

Workingpaper . Eslami, M., A. Aleyasen, K. G. Karahalios, K. Hamilton, and C. Sandvig. (2015): “FeedVis: A path for exploring news feed curation algorithms,”

CSCW’15 Com-panion: Proceedings of the 18th ACM Conference Companion on Computer SupportedCooperative Work & Social Computing , 65-68.

European Parliament and Council of European Union(2016) Regulation (EU) 2016/679, https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN, Accessed 08/08/2020.22 iorina, M. P., and S. J. Abrams. (2008): “Political polarization in the Americanpublic,”

Annual Review of Political Science , 11, 563-588.

Flaxman, S., S. Goel, and J. M. Rao. (2016): “Filter bubbles, echo chambers andonline news consumption,”

Public Opinion Quarterly , 80(S1), 298-320.

Gailmard, S. (2009): “Multiple principals and oversight of bureaucratic policy-making,”

Journal of Theoretical Politics , 21(2), 161-186.

Gehlbach, S., and K. Sonin. (2014): “Government control of the media,”

Journal ofPublic Economics , 118, 163-171.

Gentzkow, M. (2016): “Polarization in 2016,”

Working Paper . H´ebert, B., and M. D. Woodford. (2017): “Rational inattention in continuous time,”

Working Paper . Hu, L., A. Li, and I. Segal. (2019): “The politics of personalized news aggregation,”

Working Paper . Kamenica, E., and M. Gentzkow. (2011): “Bayesian persuasion,”

American EconomicReview , 101(6), 2590-2615.

Khalila, F., D. Martimort, and B. Parigic. (2007): “Monitoring a common agent:Implications for ﬁnancial contracting,”

Journal of Economic Theory , 135, 35-67.

Ma´ckowiak, B., F. Matˇejka, and M. Wiederholt. (2018): “Rational inattention: Adisciplined behavioral model,”

Working Paper . Maskin, E., and J. Tirole. (2004): “The politician and the judge: Accountability ingovernment,”

American Economic Review , 94(4), 1034-1054.

Matˇejka, F., and A. McKay. (2015): “Rational inattention to discrete choices: A newfoundation for the multinomial logit model,”

American Economic Review , 105(1), 272-298. 23 atˇejka, F., and G. Tabellini. (2016): “Electoral competition with rationally inatten-tive voters,”

Working Paper . Matsa, K. E., and K. Lu. (2016): “10 Facts about the changing digital news landscape,”

Pew Research Center , September 14.

Morris, S., and P. Strack. (2017): “The Wald problem and the equivalence of sequentialsampling and static information costs,”

Working Paper . Mullainathan. S, and A. Shleifer. (2005): “The market for news,”

American Eco-nomic Review , 95(4), 1031-1053.

Obama, B. (2017): “President Obama’s farewell address,”https://obamawhitehouse.archives.gov/farewell, Accessed 03/28/2019.

Pariser, E. (2011):

The Filter Bubble: How the New Personalized Web Is Changing WhatWe Read and How We Think,

New York, NY: Penguin Press.

Perego, J., and S. Yuksel. (2018): “Media competition and social disagreement,”

Work-ing Paper . Peters, M. (2001): “Common agency and the revelation principle,”

Econometrica , 69(5),1349-1372.

Prat, A. (2005): “The wrong kind of transparency,”

American Economic Review , 95(3),862-877. ——— (2018): “Media power,”

Journal of Political Economy , 126(4), 1747-1783.

Prat, A., and D. Str¨omberg. (2013): “The Political Economy of Mass Media,” in

Ad-vances in Economics and Econometrics: Theory and Applications, Tenth World Congress ,ed. by D. Acemoglu, M. Arellano, and E. Dekel.. Cambridge University Press.24 rior, M. (2017): “Conditions for political accountability in a high-choice media environ-ment,” in

The Oxford Handbook of Political Communication , ed. by K. Kenski and K. H.Jamieson.. Oxford University Press.

Shannon, C. E. (1948): “A mathematical theory of communication,”

Bell Labs TechnicalJournal , 27(3), 379-423.

Sims, C. A. (1998): “Stickiness,”

Carnegie-Rochester Conference Series on Public Policy ,49(1), 317-356. ——— (2003): “Implications of rational inattention,”

Journal of Monetary Economics ,50(3), 665-690.

Str¨omberg, D. (2015): “Media and politics,”

Annual Review of Economics , 7, 173-205.

Suen, W. (2004): “The self-perpetuation of biased beliefs,”

The Economic Journal , 114,377-396.

Sunstein, C. R. (2009):

Republic.com 2.0 , Princeton, NJ: Princeton University Press.

Teixeira, T. S. (2014): “The rising cost of consumer attention: Why you should care,and what you can do about it,”

Working Paper . Tsakas, E. (2019): “Robust scoring rules,”

Theoretical Economics , 15, 955-987.

Warren, E. (2019): “Here’s how we can break up Big Tech,”

Medium , March 8.

Warren, P. (2012): “ Independent auditors, bias, and political agency,”

Journal of PublicEconomics , 96, 78-88.

Wolton, S. (2019): “Are biased media bad for democracy?,”

American Journal of PoliticalScience , 63(3), 548-562.

Zhong, W. (2017): “Optimal dynamic information acquisition,”

Working Paper .25 ppendices (For online publication only)A Proofs

A.1 Preliminaries

As explained in Section 2.1, it is without loss to identify any ﬁnite signal structure Π :Ω → ∆ ( Z ) with its corresponding tuple (cid:104) π z , µ z (cid:105) z ∈Z . Bayes’ plausibility mandates that theexpected posterior mean must equal the prior mean, i.e., (cid:88) z ∈Z π z · µ z = 0 . (BP)For any binary signal structure, we write Z = { L, R } and assume w.l.o.g. that µ L < < µ R .From Bayes’ plausibility, we deduce that π L = | µ R || µ L | + µ R and π R = | µ L || µ L | + µ R , (2)so it is w.l.o.g. to identify the signal structure with the proﬁle ( | µ L | , µ R ), hereafter writtenas ( x, y ). From consuming ( x, y ), voter k ’s gains the following amount of expressive votingutility: V k ( x, y ) =  xx + y [ v k + y ] + if k ≤ , − yx + y [ v k − x ] − if k > , (3)and he must pay the following amount of attention: I ( x, y ) = yx + y h ( x ) + xx + y h ( y ) . (4)Thus, a typical level curve of the attention function is C ( I ) = (cid:26) ( x, y ) : yx + y h ( x ) + xx + y h ( y ) = I (cid:27) , (5)26nd it is downward sloping by Assumption 1. Among all signal structures lying on C ( I ),only (cid:0) h − ( I ) , h − ( I ) (cid:1) is neutral, whereas the remaining ones are either L -biased ( x < y )or R -biased ( x > y ). For any ( x, y ), ( x (cid:48) , y (cid:48) ) ∈ C ( I ), either ( x, y ) is more L -biased than( x (cid:48) , y (cid:48) ) ( x/y < x (cid:48) /y (cid:48) ), or ( x, y ) is more R -biased than ( x (cid:48) , y (cid:48) ) ( x/y > x (cid:48) /y (cid:48) ). A.2 Proofs of lemmas

Proof of Lemma 1

If voters can observe the performance state without error, then theywould all vote for the incumbent (resp. challenger) if ω = 1 (resp. ω = −

1) in the casewhere the high-type incumbent exerts high eﬀort. The latter’s expected payoﬀ given suchbehavior equals p (1) − c if he exerts high eﬀort and p (1) if he exerts low eﬀort, and he iswilling to exert high eﬀort if p (1) − c ≥ p (1), or equivalently 1 ≥ ˆ c . Proof of Lemma 2

We only prove the result for voter 1. The proofs for voters 0 and − Step 1.

Show that any solution to voter 1’s problem must meet the descriptions in Lemma2. Rewrite voter 1’s problem asmax Z , Π:Ω → ∆( Z ) ,λ V (Π) + λ ( I − I (Π))s.t. λ ≥ , I − I (Π) ≥ , and λ ( I − I (Π)) = 0 (6)where λ denotes a Lagrange multiplier associated with his bandwidth constraint. Noticethat λ >

0, because if λ = 0, then the solution to Problem (6) is to fully reveal theperformance state to the voter, who would then run out of bandwidth, a contradiction.This proves the claim that the voter must exhaust his bandwidth.Now take any Lagrange multiplier λ > Z , (cid:104) π z ,µ z (cid:105) z ∈Z (cid:88) z ∈Z π z (cid:0) − [ v + µ z ] − − λh ( µ z ) (cid:1) s.t. (BP) . (7)27ote that the maximand of Problem (7) is posterior-separable. Also note that − [ v + µ ] − − λh ( µ ) is the maximum of two strictly concave functions of µ : − λh ( µ ) and − v − µ − λh ( µ ),and because these functions single-cross at µ = − v , their maximum is M-shaped. Giventhese observations, we can then solve Problem (7) by applying the concaviﬁcation methoddeveloped by Kamenica and Gentzkow (2011). It is easy to see that for any given λ > Step 2.

Show that the solution to Problem (6) is unique. In Step 1, we showed that thesolution to the relaxed problem (7) is unique for any given λ >

0, so it remains to verifythat the Lagrange multiplier associated with the voter’s bandwidth constraint is unique.Suppose the contrary is true. Take any two distinct Lagrange multipliers λ > λ (cid:48) >

0, andlet ( x, y ) and ( x (cid:48) , y (cid:48) ) denote the unique solutions to Problem (7) given λ and λ (cid:48) , respectively.From strict optimality, i.e., the voter strictly prefers ( x, y ) to ( x (cid:48) , y (cid:48) ) when the Lagrangemultiplier is λ , and he strictly prefers ( x (cid:48) , y (cid:48) ) to ( x, y ) when the Lagrange multiplier is λ (cid:48) ,we deduce that λ (cid:0) I (cid:0) x (cid:48) , y (cid:48) (cid:1) − I ( x, y ) (cid:1) > V (cid:0) x (cid:48) , y (cid:48) (cid:1) − V ( x, y ) > λ (cid:48) (cid:0) I (cid:0) x (cid:48) , y (cid:48) (cid:1) − I ( x, y ) (cid:1) . Then from λ > λ (cid:48) , it follows that I ( x (cid:48) , y (cid:48) ) > I ( x, y ), which contradicts the fact that I ( x, y ) = I ( x (cid:48) , y (cid:48) ) = I . Proof of Lemma 3

Again, we only prove the result for voter 1, and in two steps.

Step 1.

Show that Π is R -biased. Write ( x, y ) for Π . Note ﬁrst that ( x, y ) cannot be L -biased, because if the contrary were true, i.e., y > x , then voter 1 would strictly prefer28 y, x ) to ( x, y ): V ( y, x ) = xx + y ( y − v ) > yx + y ( x − v ) = V ( x, y ) . It remains to show that ( x, y ) cannot be neutral, i.e., x (cid:54) = y . For starters, rewrite Problem(7) as max x ∈ [ v , ,y ∈ [0 , yx + y ( x − v ) − λ (cid:18) yx + y h ( x ) + xx + y h ( y ) (cid:19) . (8)If the solution to Problem (8) were neutral, i.e., x = y , then only three situations couldhappen: ( x, y ) ∈ ( v , , ( x, y ) = (1 , x, y ) = ( v , v ). In the ﬁrst situation, ( x, y )must satisfy the following ﬁrst-order conditions: y + v = λ (cid:2) ∆ + h (cid:48) ( x ) Σ (cid:3) (9)and x − v = λ (cid:2) h (cid:48) ( y ) Σ − ∆ (cid:3) (10)where ∆ := h ( y ) − h ( x ) and Σ := x + y . Plugging x = y into (9) and (10) and simplifyingyields x + v = λh (cid:48) ( x ) · x = x − v , which is impossible. Meanwhile, the second situation isimpossible because the voter would run out of bandwidth. In the third situation, we have V ( v , v ) = 0. But then the voter would strictly prefer (1 − (cid:15), (cid:15) ) to ( v , v ) when (cid:15) > V (1 − (cid:15), (cid:15) ) = (cid:15) − (cid:15) + (cid:15) (1 − (cid:15) − v ) > , and it is moreover feasible: I (1 − (cid:15), (cid:15) ) = (cid:15) − (cid:15) + (cid:15) h (1 − (cid:15) ) + 1 − (cid:15) − (cid:15) + (cid:15) h ( (cid:15) ) < (cid:15)h (1) + h ( (cid:15) ) < I . Step 2.

Show that Π becomes more R -biased as v increases. Fix any 0 < v < v (cid:48) , andlet ( x, y ) and ( x (cid:48) , y (cid:48) ) denote the unique solutions to Problem (6) when the voter’s horizontalpreference parameters are given by v and v (cid:48) , respectively. Since the voter strictly prefers29 x, y ) to ( x (cid:48) , y (cid:48) ) when his horizontal preference parameter is v , and he strictly prefers ( x (cid:48) , y (cid:48) )to ( x, y ) when his horizontal preference parameter is v (cid:48) , i.e., yx + y ( x − v ) > y (cid:48) x (cid:48) + y (cid:48) (cid:0) x (cid:48) − v (cid:1) and y (cid:48) x (cid:48) + y (cid:48) (cid:0) x (cid:48) − v (cid:48) (cid:1) > yx + y (cid:0) x − v (cid:48) (cid:1) , it follows that (cid:18) y (cid:48) x (cid:48) + y (cid:48) − yx + y (cid:19) v > x (cid:48) y (cid:48) x (cid:48) + y (cid:48) − xyx + y > (cid:18) y (cid:48) x (cid:48) + y (cid:48) − yx + y (cid:19) v (cid:48) and hence that y (cid:48) / ( x (cid:48) + y (cid:48) ) < y/ ( x + y ). The last condition can be rewritten as x (cid:48) /y (cid:48) > x/y ,which proves that ( x (cid:48) , y (cid:48) ) is more R -biased than ( x, y ). Proof of Lemma 4

Let ( x k , y k ) denote the personalized news aggregator for voter k ∈ K .Tedious algebra (detailed in the proof of Theorem 1) shows that P k = 2 x k y k x k + y k ∀ k ∈ K and D = 1 − x y (1 + x y )( x + y ) . (11)Write P and D as functions of ( x , y ), or simply ( x, y ). For each function g ∈ { I, P , D } of ( x, y ), write g x for ∂g ( x, y ) /∂x and g y for ∂g ( x, y ) /∂y . Tedious algebra shows that I x = y ( x + y ) (cid:2) h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) (cid:3) (12) I y = x ( x + y ) (cid:2) h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) (cid:3) (13) P ,x = 2 y ( x + y ) (14) P ,y = 2 x ( x + y ) (15) D x = − y (cid:0) xy − x + y (cid:1) ( x + y ) (16)and D y = − x (cid:0) x y − y + x (cid:1) ( x + y ) , (17)30here I x , I y > h is strictly increasing and strictly convex on [0 , h ( y ) − h ( x )+ h (cid:48) ( x )( x + y ) ≥ h (cid:48) ( x )( y − x )+ h (cid:48) ( x )( x + y ) = 2 h (cid:48) ( x ) y > h ( x ) − h ( y )+ h (cid:48) ( y )( x + y ) > P ( x, y ) decreases as we traverse along voter 1’s attentionlevel curve C ( I ) from its neutral element to its most R -biased element. This portion ofthe attention level curve can be expressed as C + ( I ) = (cid:8) ( x, y ) ∈ C ( I ) : x ∈ (cid:2) h − ( I ) , (cid:3)(cid:9) ,where x ≥ y and x > y if and only if x > h − ( I ). As we increase x by a small amount (cid:15) >

0, we must change y by approximately ( − I x /I y ) · (cid:15) in order to stay on C + ( I ), and theresulting change in P equals approximately ( P ,x − P ,y I x /I y ) · (cid:15) . Since P ,y >

0, it suﬃcesto show that − I x /I y < − P ,x /P ,y holds for all ( x, y ) ∈ C + ( I ) . By (12), (13), (14), and(15), the last condition is equivalent to h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) > yx . (18)To complete the proof, note that the numerator and denominator on the left-hand side of(18) are both positive, and they can be bounded as follows under the assumption that h isstrictly convex and strictly increasing on [0 , h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) > h (cid:48) ( x ) ( y − x ) + h (cid:48) ( x ) ( x + y ) = 2 h (cid:48) ( x ) y,h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) < h (cid:48) ( x ) ( x − y ) + h (cid:48) ( x ) ( x + y ) = 2 h (cid:48) ( x ) x. Combining these inequalities gives the desired result.We next show that D ( x, y ) increases as we traverse along C ( I ) as above. For any( x, y ) ∈ C + ( I ), D y = − x (cid:0) x y − y + x (cid:1) / ( x + y ) < x ≥ y . Thus if D x ≥

0, then D x − D y I x /I y > D x <

0, then it suﬃces to show thatfor all ( x, y ) ∈ C ( I ), it holds that − I x /I y < − D x /D y , or equivalently h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) > xy − x + y x y − y + x . x > y , which together implies that h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) > yx = 2 xy x y > xy − x + y x y − y + x . Part (ii): We ﬁrst prove the claim that P k is increasing in I k for voter 1. The proofs forvoters 0 and − P ( x, y ) = 2 xy/ ( x + y )is increasing in x and y , it suﬃces to show that x and y are both increasing in I . Theremainder of the proof proceeds in two steps. Step 1.

Show that x and y increase as the Lagrange multiplier associated with voter 1’sbandwidth constraint decreases. Recall Problem (8), which maximizes voter 1’s expressivevoting utility while taking the Lagrange multiplier λ > v , × [0 , h (cid:48) ( x ) + h (cid:48) ( y ) = 1 /λ, (19)and using this result when diﬀerentiating (10) with respect to λ yields − dxdλ =∆ − h (cid:48) ( y ) Σ+ λ (cid:20) h (cid:48) ( y ) dydλ − h (cid:48) ( x ) dxdλ − h (cid:48)(cid:48) ( y ) dydλ Σ − h (cid:48) ( y ) dydλ − h (cid:48) ( y ) dxdλ (cid:21) =∆ − h (cid:48) ( y ) Σ − λh (cid:48)(cid:48) ( y ) dydλ Σ − dxdλ where ∆ := h ( y ) − h ( x ) and Σ := x + y . Therefore, dydλ = ∆ − h (cid:48) ( y ) Σ λh (cid:48)(cid:48) ( y ) Σ = h ( y ) − h ( x ) − h (cid:48) ( y ) ( x + y ) λh (cid:48)(cid:48) ( y ) ( x + y ) < , (20)where the last inequality exploits the fact that h (cid:48)(cid:48) > h (cid:48) > , h ( y ) − h ( x ) − h (cid:48) ( y ) ( x + y ) < ≤ y < x and h ( y ) − h ( x ) − h (cid:48) ( y ) ( x + y ) ≤ h (cid:48) ( y ) ( y − x ) − (cid:48) ( y ) ( x + y ) < y ≥ x ≥ v . Meanwhile, diﬀerentiating (19) with respect to λ yields − h (cid:48)(cid:48) ( x ) dxdλ = h (cid:48)(cid:48) ( y ) dydλ + 1 λ . Simplifying this result using (20) yields dxdλ = − ∆ + h (cid:48) ( x ) Σ λh (cid:48)(cid:48) ( x ) Σ = − h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) λh (cid:48)(cid:48) ( x ) ( x + y ) < , (21)where the last inequality exploits the fact that h (cid:48)(cid:48) > h (cid:48) > , h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) > y ≥ x ≥ v and h ( y ) − h ( x )+ h (cid:48) ( x ) ( x + y ) > − h (cid:48) ( x )( x − y )+ h (cid:48) ( x ) ( x + y ) > ≤ y < x . Together, (20) and (21) imply that x and y strictly increase as λ slightlydecreases. As λ further decreases, the solution to (8) may transition from an interior oneto a corner one. When that happens, we must have x = 1, because ( x, y ) is R -biased byLemma 3. As λ continues to decrease, x stays at 1, whereas y increases. Step 2.

Show that the Lagrange multiplier associated with voter 1’s bandwidth constraintdecreases with his bandwidth. Take any 0 < I < I (cid:48) . Let λ and λ (cid:48) denote the Lagrangemultipliers associated with voter 1’s bandwidth constraint when his bandwidth are givenby I and I (cid:48) , respectively, and let ( x, y ) and ( x (cid:48) , y (cid:48) ) denote the solutions to Problem (8)given λ and λ (cid:48) , respectively. From strict optimality, i.e., voter 1 strictly prefers ( x, y ) to( x (cid:48) , y (cid:48) ) when the Lagrange multiplier is λ , and he strictly prefers ( x (cid:48) , y (cid:48) ) to ( x, y ) when theLagrange multiplier is λ (cid:48) , we deduce that λ (cid:2) I (cid:0) x (cid:48) , y (cid:48) (cid:1) − I ( x, y ) (cid:3) > V (cid:0) x (cid:48) , y (cid:48) (cid:1) − V ( x, y ) > λ (cid:48) (cid:2) I (cid:0) x (cid:48) , y (cid:48) (cid:1) − I ( x, y ) (cid:3) . Then from I ( x, y ) = I < I (cid:48) = I ( x (cid:48) , y (cid:48) ), it follows that λ (cid:48) < λ , which together with theresult shown in Step 1 implies that x (cid:48) ≥ x , y (cid:48) ≥ y , and one of the inequalities is strict.We next show that D is in general non-monotonic in voter 1’s bandwidth or, equiv-alently, the Lagrange multiplier λ > D w.r.t. λ equals D x dx/dλ + D y dy/dλ, which is positive if and only if D x / | D y | < | dy/dλ | / | dx/dλ | . The last condition holds if D x ≤

0. In the case where D x > − y (cid:0) xy − x + y (cid:1) x (2 x y − y + x ) < h ( x ) − h ( y ) + h (cid:48) ( y ) ( x + y ) h ( y ) − h ( x ) + h (cid:48) ( x ) ( x + y ) · h (cid:48)(cid:48) ( x ) h (cid:48)(cid:48) ( y ) , (22)which can clearly be violated by h s that satisfy Assumption 1. To see why, note that the left-hand side of (22) is positive but smaller than one, whereas the right-hand side of it equalsone if h ( x ) = x . Fix any x ∗ ∈ (0 ,

1) and any (cid:15) > h : [0 , → R + where h ( x ) = x if x ∈ [0 , x ∗ ] and h ( x ) = 2 ( x ∗ + (cid:15) ) ( x − x ∗ − (cid:15) )if x ∈ [ x ∗ + (cid:15), x ∗ , x ∗ + (cid:15) ], let h be any function that satisﬁes h (cid:48) > h (cid:48)(cid:48) >

0, andsmooth-pasting at x = x ∗ and x ∗ + (cid:15) , and we are done. A.3 Proofs of theorems and propositions

Proof of Theorem 1

Write a + k for Π k ( R | ω = 1) and a − k for Π k ( R | ω = − a + − = 1 − a − , a +1 = 1 − a −− , and a +0 = 1 − a − by symmetry. Also write ( x k , y k ) for Π k . Straightforward algebra shows that a + k = x k (1 + y k ) x k + y k and a − k = x k (1 − y k ) x k + y k . Thus P k := a + k − a − k = 2 x k y k x k + y k ∀ k ∈ K P Π k s ( z − (cid:54) = z | ω = 1)= a + − (cid:0) − a +1 (cid:1) + (cid:0) − a + − (cid:1) a +1 = (cid:0) − a − (cid:1) a −− + a − (cid:0) − a −− (cid:1) ( ∵ symmetry)= P Π k s ( z − (cid:54) = z | ω = − (cid:0) − a − (cid:1) (cid:0) − a +1 (cid:1) + a − a +1 ( ∵ symmetry)= 1 + 2 a − a +1 − (cid:0) a − + a +1 (cid:1) = 1 − x y (1 + x y )( x + y ) , where the last result implies that D := P Π k s ( z − (cid:54) = z | a h = 1) = 12 (cid:88) ω ∈ Ω P ( z − (cid:54) = z | ω ) = 1 − x y (1 + x y )( x + y ) depends on a h = 1 only through Π k s but not the average probabilities of ω = 1 and ω = − / / a h = 1). The remainder of the proof proceedsin three steps. Step 1.

Reduce ξ to model primitives. Recall that ξ := P Π k s (R wins re-election | ω = 1) − P Π k s (R wins re-election | ω = − . Consider two cases: f ≥ / f < /

2. In the ﬁrst case, candidate R wins re-electionif and only if voter 0’s signal recommends R , so ξ = P . In the second case, candidate R wins re-election in two events: (i) voter ± R ; (ii) voter ± R . The part of ξ that stems from event (i) equals a + − a +1 − a −− a − = (cid:0) − a − (cid:1) a +1 − (cid:0) − a +1 (cid:1) a − = a +1 − a − := P . ξ that stems from event (ii) equals P Π k s ( z − (cid:54) = z , z = R | ω = 1) − P Π k s ( z − (cid:54) = z , z = R | ω = − Da +0 − Da − = DP . Summing up these two parts, we obtain ξ = P + DP . Step 2.

Characterize equilibrium accountability. Suppose that voters use Π k s. By in-creasing his eﬀort from low to high, candidate R changes his re-election probability by (cid:88) ω ∈ Ω p ( ω ) P Π k s ( R wins re-election | ω ) − p ( ω ) P Π k s ( R wins re-election | ω )= ( p (1) − p (1)) ξ. He is willing to exert high eﬀort if and only if ( p (1) − p (1)) ξ ≥ c , or equivalently ξ ≥ ˆ c . Step 3.

Characterize equilibrium selection. Without accountability, voters pay no at-tention to the performance data, which makes selection impossible. With accountability,equilibrium selection equals12 (cid:88) ω ∈ Ω P Π k s ( R wins re-election | ω ) · ω + P Π k s ( L wins election | ω ) ·

0= 12 ( P Π k s ( R wins re-election | ω = 1) − P Π k s ( R wins re-election | ω = − ξ . Proof of Proposition 1

When P = 0, ξ = P , which is decreasing in v by Lemma 4.When P = 1, ξ = P + D = 1 − x y (1 − x ) (1 − y )( x + y ) x = 1 or, equivalently,if ( x , y ) is the most R -biased element of C + ( I ). The example illustrates that ξ can varynon-monotonically with v for intermediate values of P . Example 1.

In the case where h ( x ) = x , a typical level curve of the attention costfunction takes the form of C ( I ) = { ( | µ L | , µ R ) : | µ L | , µ R ∈ [0 , , | µ L | µ R = I } . Given this,we can then express P , D , and ξ as P = 2 I z , D = 1 − I (1 + I ) z , and ξ = 2 I z − P I (1 + I ) z + P , where z := ( | µ L | + I / | µ L | ) − . As we increase v from zero to inﬁnity,the right-wing voter’s news aggregator traverses along the level curve C ( I ) from the latter’sneutral element (cid:0) √ I , √ I (cid:1) to its most R -biased element (1 , I ). During that process, z decreases from 1 / (cid:0) √ I (cid:1) to 1 / (1 + I ), so P decreases whereas D increases. The overalleﬀect on ξ depends on parameter values. The case where 2 P (1 + I ) ∈ (cid:0) √ I , I (cid:1) isthe most interesting, because in that case, neither the incentive eﬀect nor the disagreementeﬀect dominates, so ξ ﬁrst increases and then decreases. ♦ Proofs of Propositions 2 and 3

The proofs follow immediately from Lemma 4 and aretherefore omitted for brevity.

B Continuous state distribution

In this appendix, suppose Ω = [ − , a ∈ { , } has a p.d.f. q a that is positive almost everywhere. Let α denote thefraction of high-ability incumbents in the candidate pool, and let q := αq + (1 − α ) q be thep.d.f. of the performance state in the case where a h = 1. As before, let ω = E [ θ | ω ; a h = 1],which together with Bayes’ rule implies that q ( ω ) /q ( ω ) = ( θ h − ω ) / ( ω − θ l ).There is a single pro- L voter with a horizontal preference parameter − v ∈ ( − ,

0) andbandwidth

I >

0. A news aggregator is a ﬁnite signal structure Π : Ω → ∆ ( Z ). In the casewhere a h = 1, π z = (cid:90) ω Π ( z | ω ) q ( ω ) dω

37s the probability that the signal realization is z and is assumed w.l.o.g. to be strictlypositive, and µ z = (cid:82) ω Π ( z | ω ) q ( ω ) dω (cid:82) Π ( z | ω ) q ( ω ) dω is the posterior mean of the performance state conditional on the signal realization being z . The voter’s utility gain from using Π is given by (3), and the amount of attention thatis needed for absorbing the content generated by Π is I (Π) = H ( q ) − E Π [ H ( q ( · | ω ))]where H is the entropy function. The voter’s problem is given by (6). In the case where I < H ( q ), any solution to this problem must satisfy Lemma 2 by a straightforward extensionof Matˇejka and McKay (2015). Given that solution, accountability is sustainable if and onlyif P := (cid:90) m ( ω ) ( q ( ω ) − q ( ω )) dω ≥ c where m ( ω ) := Π ( R | ω ).In the baseline model with binary states, we demonstrated that P is decreasing in v inLemma 4. The next example shows that with a continuum of states, P can increase with v ,thus reinforcing our message that the EAS eﬀect of rational and ﬂexible attention allocationcan be quite subtle. Example 2.

Suppose θ h = 1, θ l = − α = 1 / q ( ω ) = (1 + ω ) /

2, and q ( ω ) =(1 − ω ) /

2, so q ( ω ) = 1 / P = (cid:82) ωm ( ω ) dw . In the case where I = .

1, solving P numerically for v = .

24 and .

25 yields .

13 and .

14, respectively. To develop intuition forwhy P could increase rather than decrease with v (i.e., the incentive power generated bythe news aggregator increases as the voter becomes more partisan), note that when v = . L and hence spends most of his attention on distinguishing whether ω isclose to 1 or not. The resulting function m is ﬂat and takes small values for most ω s butrises sharply as ω approaches 1. 38 .00.10.20.30.4 −1.0 −0.5 0.0 0.5 1.0 w m ( w ) v Figure 1: Plot m ( ω ) against ω : model parameters as speciﬁed in the text.As v decreases from .

25 to .

24, the voter becomes more moderate and so spends hisattention more evenly on distinguishing the various states. Also, his average propensity tovote for candidate R increases, which together implies that the function m takes a highervalue on average but is ﬂatter around ω ≈ ω , we obtain P = (cid:82) ωm ( ω ) dω . If thecomplementarity between ω and m ( ω ) is suﬃciently strong around ω ≈ P could and indeed increases as we raise v . ♦♦